Conda is a popular package management system used in machine learning and artificial intelligence research. It is built as a part of Anaconda distribution and provides a useful alternative for the pip package manager. Conda allows users to create many different environments containing different modules without there being any overlap or crossover that may occur when using pip. Each environment may be customized to a specific program’s needs and therefore allows for easy package management and access.
While using CRC systems, software may be loaded into a front end or compute node by executing a module load command. Therefore, to use conda the user must first load the module into their node.
module load conda
Once loaded you will primarily use conda commands to access the different features available.
To verify that conda was successfully installed and to check the version of conda installed you may run the following.
In Conda you are able to create multiple, unique environments. Each environment will be able to be filled with packages specifically suited for various different problems.
To view what environments are already created you may run the following code. Note that there will already be a base environment installed.
conda info --envs
To add to this list of environments you may create a new environment by using the following
conda create -n ENVNAME
Keep in mind that environments will be installed by default into the envs directory in your conda directory. You can specify a different path if you would like. For details please reference use command “conda create --help”
After an environment is created it is essentially dormant until activated. When you activate you are jumping into that environment and therefore will then have access to all the packages associated with that environment. You may activate any of your environments using the following
conda activate ENVNAME
If you would like to jump out of your currently loaded environment you may use the following command to bring you back to the base environment.
If you would like to create an exact copy of an environment you may use the following
conda create -n nameofnewenvironment --clone nameoforiginalenvironment
If you would like to delete an environment you may use the following.
conda remove -n ENVNAME --all
In each environment within conda you may load different modules into the environment. Each module is a different piece of software that you may find to be useful in solving whatever problem you may have.
To view what modules may be available use the following command
If you are looking for a specific package simply use the following search command
conda search PKGNAME
The above search only searches among the default channel. However, if you would like to search across all channels then use the following
anaconda search PKGNAME
For more specific information about all of the package versions use the following
conda search PKGNAME--info
Once you have found the name of the package you want to install you may install use the following code to install it into an environment.
conda install -n ENVNAME PKGNAME
Note that if you omit the “-n ENVNAME” portion of code the package will be installed in your current environment. All installs must be executed in a specific conda environment, not the base environment. This means that (ENVNAME) must appear to the left of your [username]. This ensures that no base modules are uninstalled, for example pip or python.
When you need to update one of your modules you may use the following update command.
conda update PKGNAME
If you need to update all of your packages in your currently loaded environment simply use the following
If you would like to delete a module from your environment you may do so with the following uninstall command
conda uninstall PKGNAME
Many times when using conda for machine learning applications we will be using python. To look for the specific versions of python available for install use the following
conda search -f python
Once you have found the specific version of python you want to install you may use the following install command to specify which python you need.
conda install -n ENVNAME python=3.4
To verify which version of python your current environment is using, use the following
There are several main channels available for use in Anaconda. These include, anaconda, conda-forge, r, and bioconda. Each channel contains different packages that may be installed into your environment. None of these channels are more important than another, but instead are there for organization of packages. By default, all users are on the default channel. If there is a specific package that you are looking to install that is not available on this default channel, then you can search for that package on another channel.
For example, the bioconda channel is a Conda channel that provides bioinformatic packages. If we wanted to switch to the bioconda channel and install a bioconda specific package we would do so using the following
conda config --add channels bioconda conda config --set channel_priority strict
When adding this channel using the “add” command we are telling Conda to add the channel at the top, or highest priority of the channels accessible to our manager. The order of channels in your Conda matters due to the potential of channel collisions. To circumvent this issue and ensure that we do not encounter any channel collisions from duplicate packages we use the second line of code above. This ensures that all of the dependencies come from the bioconda channel as opposed to the default channel.
If we wanted to set the priority back to our default channel we would have to edit the ~/.condarc file so that defaults is the first channel shown. The ~/.condarc file is only created upon creation of a new channel.
channels: defaults bioconda
Then we must run these lines of code in the command-line
conda config --set channel_priority true conda update --all
This will allow us to make changes to our default channel, without interfering with the previously made changes in the bioconda channel.
We would recommend only using the default channel unless absolutely necessary, in order to ensure no channel collisions. If you only need one or two packages from another channel that may not exist in the default channel then you should use the following
conda config --append channels bioconda
This will push the bioconda channel to the bottom of the priority list, ensuring that there are no conflicts in dependencies with the default channel.
Once the desired channel is installed we must use the following to install a package from that specific package
conda install bioconda::PKGNAME
Job Submission Using Conda Environments
Since your environments are saved in a unique file path on your node all of the packages will already be installed in the referenced environment, allowing you to customize your environment before submitting your job. Once the job is submitted it will be referencing the packages in your environment, meaning that you don’t need to redo any of your previous installations.
To load your environment use the following code, keeping in mind that the first three lines are example job submission code. Everything after those initial three lines will be as if you are running the same code in your node.
#!/bin/bash #$ -M firstname.lastname@example.org #$ -m abe module load conda source activate ENVNAME python3 example.py
This will run your python code in the environment that you specify, allowing you access to whatever packages you have previously loaded. If you would like to run multiple jobs at once then you should refer to the batch job submission page on our wiki.