Getting Started with Conda
Just the basics. What is Conda? Why should you use Conda? How do you install Conda?
What is Conda?
Conda is an open source package and environment management system that runs on Windows, Mac OS and Linux.
Conda can quickly install, run, and update packages and associated dependencies.
Conda can create, save, load, and switch between project specific software environments on your local computer.
Conda as a package manager helps you find and install packages. If you need a package that requires a different version of Python, you do not need to switch to a different environment manager, because Conda is also an environment manager. With just a few commands, you can set up a totally separate environment to run that different version of Python, while continuing to run your usual version of Python in your normal environment.
Conda vs. Miniconda vs. Anaconda
Users are often confused about the differences between Conda, Miniconda, and Anaconda. The Planemo documentation has an excellent diagram that nicely demonstrates the difference between the Conda environment and package management tool and the Miniconda and Anaconda Python distributions (N.B. the Anaconda Python distribution now has well more than 150 additional packages!).
I suggest installing Miniconda which combines Conda with Python 3 (and a small number of core systems packages) instead of the full Anaconda distribution. Installing only Miniconda will encourage you to create separate environments for each project (and to install only those packages that you actually need for each project!) which will enhance portability and reproducibility of your research and workflows.
Besides, if you really want a particular version of the full Anaconda distribution you can always create an new conda environment and install it using the following command.
Of the many different package and environment management systems around Conda is one of the few explicitly targeted at data scientists.
Conda provides prebuilt packages or binaries (which generally avoids the need to deal with compiling packages from source). TensorFlow is an example of a tool widely used by data scientists which is difficult to install source (particularly with GPU support), but that can be installed using Conda in a single step.
Conda is cross platform, with support for Windows, MacOS, GNU/Linux, and support for multiple hardware platforms, such as x86 and Power 8 and 9. In a follow up blog post I will show how to make your Conda environment reproducible across these different platforms.
Where a library or tools is not already packaged for install using conda, Conda allows for using other package management tools (such as pip) inside Conda environments.
Download the 64-bit, Python 3 version of the appropriate Miniconda installer for your operating system from and follow the instructions. I will walk through the steps for installing on Linux systems below as installing on Linux systems is slightly more involved.
Download the 64-bit Python 3 install script for Miniconda.
The script will present several prompts that allow you to customize the Miniconda install. I generally recommend that you accept the default settings. However, when prompted with the following…
Do you wish the installer to initialize Miniconda3 by running conda init?
…I recommend that you type yes (rather than the default no) to avoid having to manually initialize Conda for Bash later. If you accidentally accept the default, no worries. When the script finishes you just need to type the following commands.
conda init bash source ~/.bashrc
Once the install script completes, you can remove it.
Initializing your shell for Conda
After installing Miniconda you next need to configure your preferred shell to be "conda-aware". You may be prompted to initialize Conda for your shell when running the installation script. If so, then you can safely skip this step.
conda init bash source ~/.bashrc (base) $ # prompt indicates that the base environment is active!
It is a good idea to keep your Conda installation updated to the most recent version. The following command will update Conda to the most recent version.
conda update --name base conda --yes
Whenever installing new software it is always a good idea to understand how to uninstall the software (just in case you have second thoughts!). Uninstalling Miniconda is fairly straightforward.
Uninitialize your shell to remove Conda related content from ~/.bashrc.
conda init --reverse bash
Remove the entire ~/miniconda3 directory.
rm -rf ~/miniconda3
Remove the entire ~/.conda directory.
rm -rf ~/.conda
If present, remove your Conda configuration file.
if [ -f ~/.condarc ] && rm ~/.condarc
Where to go next?
Now that you have installed the Conda environment and package management tool you are ready to learn “best practices” for using Conda to manage your data science project environments. In my next post I will cover a what I think are a solid, minimal set of “best practices” that you can adopt to get the most out of Conda when you start your next data science project.