Conda

Conda is a popular package management system used in machine learning and artificial intelligence research. It is built as a part of Anaconda distribution and provides a useful alternative for the pip package manager. Conda allows users to create many different environments containing different modules without there being any overlap or crossover that may occur when using pip. Each environment may be customized to a specific program’s needs and therefore allows for easy package management and access.

Initial setup

While using CRC systems, software may be loaded into a front end or compute node by executing a module load command. Therefore, to use conda the user must first load the module into their node.

 module load conda 

Once loaded you will primarily use conda commands to access the different features available.

To verify that conda was successfully installed and to check the version of conda installed you may run the following.

 conda info 

Environment Management

In Conda you are able to create multiple, unique environments. Each environment will be able to be filled with packages specifically suited for various different problems.

To view what environments are already created you may run the following code. Note that there will already be a base environment installed.

 conda info --envs 

To add to this list of environments you may create a new environment by using the following

 conda create -n ENVNAME 

Keep in mind that environments will be installed by default into the envs directory in your conda directory. You can specify a different path if you would like. For details please reference use command “conda create --help”

After an environment is created it is essentially dormant until activated. When you activate you are jumping into that environment and therefore will then have access to all the packages associated with that environment. You may activate any of your environments using the following

 conda activate ENVNAME 

If you would like to jump out of your currently loaded environment you may use the following command to bring you back to the base environment.

 conda deactivate 

If you would like to create an exact copy of an environment you may use the following

 conda create -n nameofnewenvironment --clone nameoforiginalenvironment 

If you would like to delete an environment you may use the following.

 conda remove -n ENVNAME --all 

Module Management

In each environment within conda you may load different modules into the environment. Each module is a different piece of software that you may find to be useful in solving whatever problem you may have.

To view what modules may be available use the following command

 conda list 

If you are looking for a specific package simply use the following search command

 conda search PKGNAME 

The above search only searches among the default channel. However, if you would like to search across all channels then use the following

 anaconda search PKGNAME 

For more specific information about all of the package versions use the following

 conda search PKGNAME--info 

Once you have found the name of the package you want to install you may install use the following code to install it into an environment.

 conda install -n ENVNAME PKGNAME 

Note that if you omit the “-n ENVNAME” portion of code the package will be installed in your current environment. All installs must be executed in a specific conda environment, not the base environment. This means that (ENVNAME) must appear to the left of your [username]. This ensures that no base modules are uninstalled, for example pip or python.

When you need to update one of your modules you may use the following update command.

 conda update PKGNAME 

If you need to update all of your packages in your currently loaded environment simply use the following

 conda --update-all 

If you would like to delete a module from your environment you may do so with the following uninstall command

 conda uninstall PKGNAME 

Python Management

Many times when using conda for machine learning applications we will be using python. To look for the specific versions of python available for install use the following

 conda search -f python 

Once you have found the specific version of python you want to install you may use the following install command to specify which python you need.

 conda install -n ENVNAME python=3.4 

To verify which version of python your current environment is using, use the following

 python --version 

Channel Management

There are several main channels available for use in Anaconda. These include, anaconda, conda-forge, r, and bioconda. Each channel contains different packages that may be installed into your environment. None of these channels are more important than another, but instead are there for organization of packages. By default, all users are on the default channel. If there is a specific package that you are looking to install that is not available on this default channel, then you can search for that package on another channel.

For example, the bioconda channel is a Conda channel that provides bioinformatic packages. If we wanted to switch to the bioconda channel and install a bioconda specific package we would do so using the following

conda config --add channels bioconda
conda config --set channel_priority strict 

When adding this channel using the “add” command we are telling Conda to add the channel at the top, or highest priority of the channels accessible to our manager. The order of channels in your Conda matters due to the potential of channel collisions. To circumvent this issue and ensure that we do not encounter any channel collisions from duplicate packages we use the second line of code above. This ensures that all of the dependencies come from the bioconda channel as opposed to the default channel.

If we wanted to set the priority back to our default channel we would have to edit the ~/.condarc file so that defaults is the first channel shown. The ~/.condarc file is only created upon creation of a new channel.

channels:
defaults
bioconda 

Then we must run these lines of code in the command-line

conda config --set channel_priority true
conda update --all

This will allow us to make changes to our default channel, without interfering with the previously made changes in the bioconda channel.

We would recommend only using the default channel unless absolutely necessary, in order to ensure no channel collisions. If you only need one or two packages from another channel that may not exist in the default channel then you should use the following

 conda config --append channels bioconda 

This will push the bioconda channel to the bottom of the priority list, ensuring that there are no conflicts in dependencies with the default channel.

Once the desired channel is installed we must use the following to install a package from that specific package

 conda install bioconda::PKGNAME 

Job Submission Using Conda Environments

Since your environments are saved in a unique file path on your node all of the packages will already be installed in the referenced environment, allowing you to customize your environment before submitting your job. Once the job is submitted it will be referencing the packages in your environment, meaning that you don’t need to redo any of your previous installations.

To load your environment use the following code, keeping in mind that the first three lines are example job submission code. Everything after those initial three lines will be as if you are running the same code in your node.

#!/bin/bash
#$ -M johndoe@nd.edu
#$ -m abe

module load conda
source activate ENVNAME
python3 example.py 

This will run your python code in the environment that you specify, allowing you access to whatever packages you have previously loaded. If you would like to run multiple jobs at once then you should refer to the batch job submission page on our wiki.