Submitting jobs to Amarel

Slurm scheduler

Slurm scheduler

Amarel uses the popular Slurm Workload Manager. Slurm has its own excellent documentation, which you can find here. Here, we include a template shell script that will allow you to submit jobs to Amarel using Slurm.

Template Slurm script: submit_array_jobs.sh

There are a few things to pay attention to in this script. First is the Slurm header, which comprises all the lines at the top that start with #SBATCH. Each of these lines stores information that the Slurm scheduler uses to determine the settings for your job (e.g., how many CPU cores to use, how much time to run the job for etc.)

Important
If there is a space after the # (e.g., # SBATCH), then that specific line of the header is not active (i.e., it is just a bog standard code comment). If there is no space (e.g., #SBATCH), then the line is active and will be read appropriately by the Slurm scheduler.

Choose your partition

The first thing to consider is which Amarel partition you want to submit your jobs to. In the script, there are three options that you can toggle by adding/removing spaces in the Slurm header:

# first, choose partition (note the space between # and SBATCH!)
#SBATCH --partition=main
# SBATCH --partition=p_dz268_1
# SBATCH --partition=gpu

# If partition=gpu, then you need to specify how many GPUs you want and what type.
# Here, gres=gpu:1 means "give me 1 gpu of any type". 
# Note, this will throw an error if partition=main.
# SBATCH --gres=gpu:1

If you use partition=main, your job will be submitted to the main Amarel pool, which is a pool shared by all of Rutgers. It is a very large pool but we do not have exclusive access to it. If you use partition=p_dz268_1, your job will be submitted to the CAHBIR pool. This pool only has 640 cores (main has many more than this), but only members of CAHBIR have access to this pool, so there is less competition. Finally, if you use partition=gpu, your job will be submitted to a pool of nodes that have GPU support. Like main, this pool is shared with all of Rutgers; but in our experience it doesn’t get used much! Amarel is really well equipped with GPUs, so this pool can be powerful if your jobs require CUDA acceleration.

Important
If you’re having trouble submitting to partition=p_dz268_1, that means you haven’t been added to the CAHBIR pool yet! Let Linden know and he will fix that.

Array jobs

After choosing your partition, the rest of the Slurm header has a bunch of settings to consider. They’re reasonably self explanatory, but consult the Slurm and Amarel guides if you need more info. One additional setting we explain here is the array job iterator, which is active by default in our template script.

Array jobs are a very useful HPC feature that allows you to spawn multiple jobs from a single job submission. If you submit submit_array_jobs.sh to Amarel in its current configuration, it will duplicate and run itself 500 times, creating 500 separate jobs on Amarel in the process. The only thing that will vary across those duplicates is an integer value stored in the variable SLURM_ARRAY_TASK_ID; it will be set to 0 for the first job, 1 for the second, 2 for the third, etc. This is a consequence of the following line in the Slurm header:

#SBATCH --array=0-499

This simple feature is very powerful, because it allows you to efficiently run a single script multiple times in parallel while leveraging the SLURM_ARRAY_TASK_ID to do different things. For example, say you have a script that processes neuroimaging data for a single subject, you can leverage array jobs to process 500 subjects at once by using SLURM_ARRAY_TASK_ID to read in different subjects from a list or a data structure.

Unfortunately, array jobs are capped on Amarel. You can run 500 jobs at once on the main partition (with a max of 6720 cpu cores) and 200 jobs at once on the gpu partition. But, we can code our way around this! The simplest solution is to just do serial processing inside your submit_array_jobs.sh script. You’ll still only get 500 (200) jobs running at once, but each job will now run through multiple SLURM_ARRAY_TASK_ID values. This would look like something like this:

for i in $(seq 0 500 1500); do
    NEW_SLURM_ARRAY_TASK_ID=$((${SLURM_ARRAY_TASK_ID} + ${i}))
    echo "NEW_SLURM_ARRAY_TASK_ID:" ${NEW_SLURM_ARRAY_TASK_ID}
    python ${scriptsdir}/my_python_script.py --option_a 'foo' --option_b 123 --option_c ${NEW_SLURM_ARRAY_TASK_ID}
done

Now, NEW_SLURM_ARRAY_TASK_ID will loop through [0, 500, 1000, 1500] in the first job and [1, 501, 1001, 1501] in the second job, etc. This will result in your 500 array jobs performing 2000 jobs worth of processing (when SLURM_ARRAY_TASK_ID=499, NEW_SLURM_ARRAY_TASK_ID will loop through [499, 999, 1499, 1999]). They will also each take 4 times longer.

If you don’t need an array job, just add a space after the # to the corresponding header line, and SLURM_ARRAY_TASK_ID will never be generated (and Amarel will only launch one job).

Selecting nodes based on features

Each node in the processing pool is associated with certain features such as its location (e.g., Camden, Piscataway), the CPU architecture, and, where applicable, the GPU architecture. You can use this to ask the scheduler to send your job to nodes that have certain features. For example, if you’re submitting a job to the gpu partition, there is no restriction (by default) on which type of GPU will be used to process your job. For relatively quick jobs, this is no big deal. However, for jobs that require many hours of GPU time, this could make a huge difference.

The good news is that it’s very easy to select nodes with certain types of GPUs! All you have to do is add one line to your slurm job script instructing the scheduler of your feature constraint.

E.g., here’s how you would select A100 and RTX 3090 GPUs, both of which are tagged with the ampere feature:

#SBATCH --constraint=ampere

If you’d like to use multiple types of GPUs, you can specify several constraints separated by | (the OR operator). E.g., to use A100 (ampere), RTX 3090 (ampere), and L40S (adalovelace), the line would be:

#SBATCH --constraint=ampere|adalovelace

Here is a list of features and their associated GPUs (updated Sep. 2024):

Feature	GPU(s)	Release date
adalovelace	L40S	October 2022
ampere	A100 PCIE 40GB, GeForce RTX 3090	June 2020
titan	GeForce RTX2080 Ti	September 2018
volta	Tesla V100 PCIE 32GB	March 2018
pascal	Tesla P100 PCIE 12GB	June 2016
maxwell	Quadro M6000	March 2015
k80	Tesla K80	November 2014

If you don’t know which GPUs are best for your job, we recommend at least avoiding the oldest Quadro M6000 and Tesla K80 cards; a modern CPU will often be faster than these old GPUs. Thus, a good a default would be:

#SBATCH --constraint=adalovelace|ampere|titan|volta|pascal

Submitting jobs to Amarel

Table of contents

Slurm scheduler

Choose your partition

Array jobs

Selecting nodes based on features