Installing Condor at Notre Dame

These are the instructions to install Condor onto a machine at Notre Dame. For more information about using Condor, please see the Notre Dame Condor Pool.

Requirements

Installing Condor onto your machine at Notre Dame is easy. The software and configuration files have already been made available in AFS, so a few simple commands will cause your machines to join the existing Condor pool. There are a few prerequisites that you must double-check first:

  1. Your machine must have AFS installed.
  2. Your machine must have a public network address and be able to communicate with other machines within the campus network.
  3. If you have a firewall, it must permit both UDP and TCP traffic on ports 9000-10000.
  4. You must have root access to your machine. If you do not, ask your system administrator to install Condor with these directions.
  5. Make sure that you fix this common hostname problem.

Installation

If you meet the above requirements, then install Condor following these steps.

Linux/Unix

  1. Login as root.
  2. Ensure that the condor user appears in /etc/passwd. If you must add it, use this entry:
    condor:x:108172:40:Condor Batch System:/afs/nd.edu/user37/condor:/bin/csh
  3. Add ~condor/software/bin and ~condor/software/sbin to your path.
  4. Run condor_init just once to create a few directories on your machine in /var/condor.
  5. Run ~condor/software/config/condor.boot start to start Condor.
  6. Set up your machine to start Condor when it boots:
    1. If you have /etc/rc.local, add ~condor/software/config/condor.boot start to the very end.
    2. If you have System V boot scripts, copy ~condor/software/config/condor.boot into the appropriate directories.

Mac

On a Macintosh (PPC or Intel), do the following:

  1. Login as root.
  2. Using the "Users" tool in the control panel, add a Condor user.
  3. Using "Applications/Utilities/NetInfo", edit the Condor user so that uid=108172 and home=/afs/nd.edu/user37/condor
  4. Copy ~condor/software/config/MacStartup/Condor into /Library/StartupItems.
  5. Reboot.


Common Hostname Problem on Linux

Newly installed Linux machines often have a common configuration problem that breaks several distributed systems, including Condor. Many programs need to determine their own IP address so that they can tell other machines how to contact them. This doesn't work with the default /etc/hosts on many Linux machines, which looks like this:

Incorrect /etc/hosts:

    127.0.0.1 machinename localhost localhost.localdomain

When a program attempts to determine it's own IP address, it will think it to be the loopback address 127.0.0.1. When it communicates that address to other machines, everyone gets confused. There are two solutions to this problem. You can either put the proper IP address in /etc/hosts, or you can leave it out entirely, and let DNS figure it out.

Correct /etc/hosts Option A:

    127.0.0.1 localhost localhost.localdomain
    1.2.3.4   machinename 

Correct /etc/hosts Option B:

    127.0.0.1 localhost localhost.localdomain

After making this change, either reboot the machine or run condor_restart -master to reset Condor.

Opening Firewall Ports

This only applies to CRC users. OIT/CSE machines already allow Condor access.

1. Run /etc/init.d/iptables stop to turn off all iptables.
2. Edit /etc/sysconfig/iptables and add the following lines before the "REJECT" line:

A RH-Firewall-1-INPUT  -s 129.74.0.0/16 -m state --state ESTABLISHED,NEW -p tcp -m tcp --dport 9000:10000 -j ACCEPT
A RH-Firewall-1-INPUT  -s 129.74.0.0/16 -m state --state ESTABLISHED,NEW -p udp -m udp --dport 9000:10000 -j ACCEPT

3. Restart iptables by running /etc/init.d/iptables start.

Note: This can also be accomplished by opening firewall ports between 9000 and 10000 using the firewall GUI tool.