CRC Wiki
CRC Wiki
Log in

FAQ

From CRC Wiki


Contents

CRC ACCOUNTS

How Do I Request a CRC Account?

The easiest way to request a new account is to fill out the Account Request Form. After we create your account, you will receive an email with further instructions and additional information on synchronizing your password with the rest of campus. Please be sure to read and follow the instructions carefully. Also note that a 1 hour introductory course is a co-requisite.

How Do I Find Some Quick Start Information?

Consult the CRC Quick Start Guide for advice on getting started with CRC resources quickly.

I Can't Login (New User / First Time Login)

Your password for logging into CRC resources is your standard ND account password. HOWEVER, when you first obtain an account you have to reset your ND password (as instructed by the CRC Welcome email). Please make sure you reset your password and try logging in again. You can reset your password here

I've Forgotten My Password

Your password is your standard ND account password. You may reset your netID password here

How Do I Request an Account for an External Collaborator?

To get the form needed to make an account for an external collaborator, you must first go to the nd.service-now.com page and log in.

  1. Click on Requests:

    Req shot.png

  2. Next click on NetID: Affiliate access:

    Access shot.png

This will bring you to the form where you may enter in the information. Note that you will need the individual's date of birth and their NDID#. The latter may be obtained by calling Human Resources. If the applicant has never had a Notre Dame account, you may place zeros in the form.


How Can I Maintain My CRC Account After Graduating/Leaving Notre Dame?

Your CRC account (and all associated data) will remain active while your general Notre Dame account remains active. You can expect your Notre Dame account to be deleted within approximately 60 days of leaving the university.

To maintain your Notre Dame account (and hence your CRC account) you should ensure that a faculty sponsor renews your account annually. This process can be initiated by completing an external collaborator request.

I Am Getting A "Module: Command Not Found" Message When I Log In To My CRC Account

This message generally indicates that you have deleted/corrupted your shell configuration scripts which prevent the Module environment from initializing correctly. To repair your scripts please follow these instructions.

REMOTE ACCESS

Why Is My Remote Connection So Slow?

Slow remote GUI response time is actually quite difficult to troubleshoot.

  • First because slow in this regard does not have a fully tangible metric
    • In this case slow means it is impractical for you to work with
  • Secondly because it varies on numerous conditions:
    • OS and Graphics card/drivers on your system
    • Remote visualization client (Putty/Xming)
    • Network reliability (on campus wired is usually pretty good)
    • Load (CPU, RAM, Network, Disk I/O) on the machine you are trying to access

For the first two you need laptop/desktop support which is provided by your department/college IT support folks.

For the third you can verify if you are getting at least 100Mbps consistently: ndt speedtest

For the fourth you can monitor the load on CRC machines here: https://mon.crc.nd.edu/xymon

When I try to login I see the error: "Algorithm Negotiation Failed", what does this mean?

The CRC systems only accept connections from an updated form of encryption, therefore if you are using an older ssh client you may see this error.

To solve this issue, first try updating your OS or current ssh client. If you are connecting from Windows, we recommend MobaXterm.
If you are connecting from an older MacOs system which cannot be updated, you can still connect to CRC systems by first connecting to jumpbox.nd.edu. If you continue to have login issues, contact CRC Support.

TRAINING

How Do I Receive Training?

CRC User Support conducts introductory training for new users all year round.

Periodically throughout the year, CRC organizes scheduled training classes (typically at the beginning of Fall and Spring semesters). These classes are announced on the CRC Training Page.

Outside of the scheduled training classes, training is provided on-demand by CRC User Support staff. Please send an email to CRCSupport@nd.edu to arrange a training session.

CRC SOFTWARE

What Is The Available Software At CRC?

CRC maintains information and usage instructions for all currently available software. This information can also be accessed at the CRC cluster command line using the module help and module whatis commands. More information can be found on the Modules page.

How Do I Request Software To Be Installed/Updated?

All software requests for new/updated software on CRC systems must come from Notre Dame Faculty. Please submit a Software Request Form that is attached to the CRC Software Policy. The software policy also contains information on licensing and cost sharing with CRC.

Please Note: all software requests require license approval via the University's General Council. This process can take up to 21 business days from receipt of the software request. If you require an expedited service, please make that apparent on your request form.

AFS STORAGE

How Do I Check My Available AFS Storage?

You can check your AFS disk usage with the quota command e.g.

> quota

How Do I Request More AFS Storage Space?

Please consult the CRC Storage policy for rules and limitations for user/group storage quotas.

When requesting a storage increase please email CRCSUPPORT@nd.edu with the following information:

  1. Name
  2. Name of PI
  3. Brief Justification for Increase

How Do I Give A Particular User/Group Access To A Directory In My Personal AFS Space?

  1. For a short overview, see this discussion.
  2. For a much more detailed explanation, see this documentation.

I Am Receiving A Locking Authority File Error

If you receive the following error:

> /usr/bin/xauth:  error in locking authority file /afs/crc.nd.edu/user...

you have most likely exhausted your AFS storage quota. Please delete unwanted files and/or request more storage space.

How Do I Add and Remove Users From The Shared Group Space I Administer?

To add a user use the following AFS command:

pts adduser netid group

To remove a user use the following AFS command:

pts removeuser netid group

I'm Not Seeing My Output When Running From AFS!

AFS improves I/O performance by caching program output on the local host. The output data is only written to the file server when the program is complete. To overcome this issue, you should add the fsync command to your submission script. Please see the following wiki page for more details:

http://wiki.crc.nd.edu/wiki/index.php/Synchronizing_Open_Files_in_AFS

I Am Getting The Following Message Using Find

> find ./ -type d -print 
find: WARNING: Hard link count is wrong for .: this may be a bug in your filesystem driver. 

Use the -noleaf option when using the find command:

-noleaf
    Do  not  optimize  by  assuming that directories contain 2 fewer subdirectories than their  hard  link count. 
    This option  is needed  when  searching  file systems that do not follow the Unix directory-link convention, 
    such as CD-ROM or MS-DOS file systems or AFS  volume  mount  points.
 
e.g find ./ -noleaf -type d -print

HIGH-PERFORMANCE PANASAS STORAGE (/scratch365)

How Do I Check My Available (/scratch365) Storage?

You can check your /scratch365 disk usage with the pan_df command e.g.

> pan_df -H /scratch365/netid/

How Do I Request More (/scratch365) Storage Space?

Please consult the CRC Storage policy for rules and limitations for user/group storage quotas.

When requesting a storage increase please email CRCSUPPORT@nd.edu with the following information:

  1. Name
  2. Name of PI
  3. Brief Justification for Increase

JOBS

My Job Is Aborted At Runtime Due To Excessive Disk-Swapping

If your job is aborted at runtime due to disk-swapping, this typically indicates that your simulation is starved of RAM at runtime. A general rule of thumb is to ensure that you request 1 core for every 2 GB of RAM your simulation requires e.g. if your simulation requires 16GB of RAM in total, then you need to request 8 cores in your SGE submission script (either -pe smp 16 or -pe mpi-16 16), irrespective of whether your simulation actually uses 8 cores. This ensures that 8 cores (and potentially 16GB RAM) are assigned to your job and not accessible by other users.

If unsure how much RAM your job will require, you can use the Xymon Memory Monitoring tool to see how much RAM your job is using. The Xymon Memory Monitoring tool contains two graphs. The graphs corresponds to the Physical/Real memory and the Actual memory. In memory management, no actual memory is being set aside whenever a process requests memory, but the memory management unit knows the process is planning on using the memory it requested. This will allow the memory management unit to use the unused memory for other purposes such as caching other data. The Physical/Real memory represents the amount of RAM the process requested. After the process begins writing to RAM, the memory management unit will begins giving actual memory to the process so it can write. If the RAM is being occupied by other data, that data is flushed so it can be used by the process. The RAM that the process is actually using represents the Actual memory graph. So in summary, there should not be a problem if your job's Actual graph never reaches the Physical/Real graph. If it does, that means the machine has run out of RAM for your job and that will cause the system to use the Hard Disk to read/write data. This can cause your job to take longer than expected as well as potentially crash the system your job is running on if it is excessively reading/writing to disk.

How Does Swap Affect Performance?

Once a system starts to run out of RAM, it “swaps” in memory from the hard drive. Hard drives are much slower than RAM, so any process that uses swapped memory will experience a substantial decrease in performance. To demonstrate this, here are the results of HPL tests with increasing values for N.

The following performance data was gathered using HPL on machine dqcneh014:

Swap Performance.png

For HPL tests, performance, which is measured in flops (floating-point operations per second), will normally steadily increase as N increases. As a result, the performance steadily rises until N reaches 37,000. At this point, the system was using over 90% of its available RAM, and it was running at peak performance. Swapping first occurs for the N value of 37500, and due to the usage of slower memory, all N values beyond 37,500 have a greatly decreased performance.

What Is A Swap File And Where Is It Located?

There is a swap file on each machine. It is a portion of the hard drive set aside to supplement RAM, or physical memory. The swap file is also known as virtual memory. Usually, the swap file is used as a RAM overflow, if you will. When there isn’t enough RAM to run an application, or applications, the machine will swap between RAM and the swap file. Since accessing the hard drive is much slower than RAM, performance begins to degrade. When a large portion of swap is needed, it can cripple or crash a machine.

Do Different Operating System Versions Affect Performance?

Each machine uses a version of Red Hat Enterprise Linux, which has various versions available. These different Red Hat builds can have a slight impact on performance.

According to multiple HPL trials run on machine dqcneh014 using a constant N, it appears that RHEL version 7.1 has the slowest performance, while RHEL version 7.2 has a slightly faster performance than the other versions. Other Red Hat versions tested include 6.4, 6.6, and 6.7.

The complete data for this test, and for the test in 7.2, can be found in the following Google Doc: https://docs.google.com/spreadsheets/d/19c2if441JywnQy4N8-NhMAwDFkmCwzY275hc7vYg9g4/edit?usp=sharing


'Unable to run job: error: no suitable queues' Message When Submitting Job Script

By default, UGE applies strict validation of resource requests within your submission script including checks for legal queue names and core counts that match valid parallel environments (pe). These checks are very useful in preventing erroneous jobs entering the batch system and immediately falling into (Eqw) error states.

If an invalid resource request is identified by this validation procedure, your submission script will be rejected immediately with the "Unable to run job:" error message.

For example the mpi-8 machines (machines with 8 cores) are either faculty owned (restricted) or on a restricted access list for codes which require an Infiniband interconnect. Standard user scripts will fail when specifying mpi-8. In the case the user should use mpi-12 as the majority of systems available for general access are 12 core machines.

Note: Scripts that contain the mem_free= resource request i.e.

#$ -l mem_free=X

will unfortunately always fail the validation test (due to lack of memory information at the time the validation is undertaken). To overcome this situation, please disable the validation process in your submission script by adding the following line:

#$ -w n

How Can I Monitor The Behavior Of My Running Jobs?

The behavior of running jobs can monitored using the Xymon online tool. After using qstat to identify the server(s) on which your jobs are running, use the Xymon link below to locate the specific server name. Associated with each server is a series of sub-links that you can use to monitor CPU status, memory usage, communication statistics etc.

CRC Xymon

My Job Has Created Zombies...What Does That Mean?

Zombie Processes are processes that still remain running even though your job has terminated within the Grid Engine batch system. There are various reasons why this happens including ungraceful termination of MPI due to either software or hardware issues.

The CRC runs a script that runs periodically to remove these zombie processes from our systems. If zombie processes are detected during the running of this script, due to one of your jobs, you will be notified via email. If you continue to receive zombie notifications regularly please contact CRC Support for assistance in detecting the cause.

APPLICATIONS

I Need To Install ArcGIS On My System

Please consult the following wiki page for advice on how to request an installation on your local system.

I am trying to compile my code with mvapich2 but cannot find -lpsm_infinipath

Please compile your code on crcfeib01.crc.nd.edu. Please note this front end node is accessible within campus network or through VPN if you are off-campus.

I See Incomplete Images In COMSOL v4+

The OpenGL graphics libraries on our machines are not compatible with the latest versions of Comsol v4+. This results in distortion or incompleteness of rendered images. To overcome this issue please set your COMSOL graphics configuration to use software rendering via the following menu option:

Option -> Preferences > Graphics > Rendering >Software

My Gaussian Jobs Are Being Aborted Due To /tmp Over-usage

By default GAUSSIAN jobs write temporary working data to the /tmp directory of the compute node. For large GAUSSIAN jobs this /tmp space can fill very quickly causing the machine to slowdown/crash ( note: machine supervisors may pre-empt machine crashes by killing your jobs).

To avoid this situation please direct GAUSSIAN to write temporary working files to your own storage allocation (either AFS or /scratch365). Details on how to set this up in your GAUSSIAN scripts can be found here

Request of ArcGIS Installation and Support

All GIS software installation and GIS desktop support for the Colleges of Science and Engineering will be handled by Engineering and Science Computing support. You can get the desktop support team contact info for the Colleges of Science and Engineering from the ESC website or send email at <help@esc.nd.edu>.


GRANTS/PROPOSALS

How Do I Acknowledge the CRC In My Proposals?

To acknowledge collaboration with the CRC, please add the following citation to your proposals:

"This research was supported in part by the Notre Dame Center for Research Computing through [CRC resources]. 
[We specifically acknowledge the assistance of]"

Does The CRC Provide An Example Of A Data Management Plan?

Please consult the following CRC Data Management Plan webpage for more information including sample plans.