CRC Wiki
CRC Wiki
Log in

CRC UGE Environment

From CRC Wiki

General access to the CRC computational resources is managed with the Grid Engine software tool set. Users should be aware of 4 key framework concepts/components in order to achieve the most effective access to CRC compute nodes. The 4 components covered here are queues, parallel environments, host groups, and user groups. In this document the terms core, slot, and process can be considered suitably equivalent. Note this documentation only covers the CRC general resource pool. There are additional unique instances for particular research group requirements.

Queues

CRC queues are configured for differing user run time requirements:

  • long (15 days, no limit on # of cores, normal priority)
  • debug (1 hour, no limit on # of cores, normal priority)
  • a maximum limit of 50 running jobs combined between all queues
  • a maximum limit of 2,000 tasks within an array job

NOTE: The long queue contains many machines with various hardware architectures. The debug queue consists of only Intel based servers each with a total of 24 cores with 64GB of RAM

Queues are specified in your submit script with the syntax

#$ -q long


How to list the Available Queues

Parallel Environments

Parallel Environments (PE) allow the user to specify the mechanisms by which the target code runs in parallel. If a PE is not specified the job will be treated as a serial job and be allocated 1 core.

  • Use the MPI PEs (mpi-8, mpi-16, etc...) when you need more cores than available on 1 machine
  • Use the SMP PE when you need less than or equal to the number of cores on one machine

CRC parallel environments include:

  • smp (shared memory processing on a single machine up to 64 cores)

MPI is the PE for MPI applications at the CRC (which have been compiled against OpenMPI, MPICH2, and MVAPICH MPI libraries)

  • mpi-8 (multi [8 core per] machine processing in 8 core increments)
  • mpi-12 (multi [12 core per] machine processing in 12 core increments)
  • mpi-16 (multi [16 core per] machine processing in 16 core increments)
  • mpi-32 (multi [32 core per] machine processing in 32 core increments)
  • mpi-64 (multi [64 core per] machine processing in 64 core increments)

IMPORTANT - Increments (8,12,16) are mapped to total cores per server (dqcneh, d6copt). A server with 8 cores will not accept mpi-12 requests.

PEs are specified in your submit script with the syntax

#$ -pe mpi-12 12
#$ -pe smp 8

Notes on MPI and Parallel Code

How to list the Available PEs

qconf -spl


Host Groups

Hosts groups allow for the specification of hosts/servers with differing architectures. For example a user can specify a host group of Intel versus AMD based servers.

Host groups are specified with the syntax

 #$ -q *@@dqcneh
  • How to list the Available Host Groups
qconf -shgrpl


User Groups

User groups allow for the designation resource access to specific individuals. This is most often utilized for faculty or group owned resources accessible through the Grid Engine. Users do not need to specify their user group as this is automatically determined by user ID. Users must however make sure via their faculty adviser that their ID is in the user group.

  • How to list the Available User Groups
qconf -sul


Grid Engine Options

Below are some of the more common options you may want to add to your Grid Engine script. For more descriptions and a full listing, see the manual page for qsub:

man qsub
#$ -v variablename[=value] 
Pass a local environment variable to the job instance. For example, to run a batch job with a GUI based program, you would need to pass it the DISPLAY variable:
#$ -v DISPLAY
#$ -N name 
The name to assign to the job. Also applies to the name of the output file associated with stdout/stderr.
#$ -M email@your.domain 
Provide an email address to send job information.
#$ -m [abe] 
Optionally send an email for any of job (a)bort, job (b)egin, or job (e)nd. Do not specify this option unless you also provide *-M* as your email will be sent to the systems admins by default.
#$ -l h_rt={sec|h:m:s} 
Specify a maximum runtime for your job in either total seconds, or hours:minutes:seconds format.