Submitting an array Job to SGE

What are Array Jobs?

Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The number of tasks in a array job is unlimited. The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size. Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, i.e. in a total of 5 tasks identical tasks with the environment variable SGE_TASK_ID containing one of the 5 index numbers each.

To avoid receiving numerous emails, please do not use e-mail notification when submitting an array job.

A Simple Array Job Example

Here is a simple script that will output the content of two input files input_1 and input_2

$ cat input_1
This is the content of input1 file
$ cat input_2
This is the content of input2 file

The submit script array.sh will be:

#!/bin/csh
#$ -N testarray
#$ -t 1-2:1
cat input_${SGE_TASK_ID}

The submission is as usual:

qsub array.sh

Like all the other parameters, the array configuration can be deleted from the submission script and can be passed directly via qsub as following:

qsub -t 1-2:1 array.sh

A status of the job will be:

$ qstat -u afs_id
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID 
---------------------------------------------------------------------------------------------
 485898     0 testarray  afs_id       qw    04/10/2008 15:05:29                    1,2

The last column shows that this job contains 2 tasks (1,2) with an increment of 1


STDOUT and STDERR of array job tasks will be written into different files with the default location <jobname>.[’e’|’o’]<job_id>’.’<task_id>

$ ls testarray*
testarray.o485898.1  testarray.o485898.2

Both files will show the results of the script command execution 'cat input_${SGE_TASK_ID}'

$ cat testarray.o485898.1
This is the content of input1 file

$ cat testarray.o485898.2
This is the content of input2 file

If the execution order of the job array is important, the parameter -hold_jid will help to define the job dependency list of the submitted job.


A more complex array job example

In this example, we use Matlab to demonstrate how to work with non-incremental data by first creating a shell array in our script. We then use ${SGE_TASK_ID} to reference specific values of our array.

#!/bin/bash
#$ -N testarray
#$ -t 1-4:1
#$ -r y

module load matlab/9.2

export MATLABPATH={PATH_TO_ADDITIONAL_M_FILES_IF_NEEDED}

array=( "orange" "apple" 3.14 16 )

matlab -nodisplay -nosplash -nojvm -r "myFunction( ${array[${SGE_TASK_ID}]});exit"

The myfunction.m will receive respectively for each task 1,2,3 and 4 as an input parameter.

Deleting A Job Within An Array Job

You may want to delete a certain job within your array job without deleting the entire array. To do so, follow the pattern:

qdel JobID.JobToDelete

For instance, if you wanted to delete the second job of an array job with the job ID 442527, you would type:

qdel 442527.2

This will effectively kill the second job of the array job, without harming any of the other jobs.