Software:python

From CAC Wiki
Jump to: navigation, search
  • Default Version: 3.5.4
  • Latest Version : 3.10.2

Note: If you insist on using Conda/Anaconda instead of virtualenv, you will be on your own as Conda is not supported

Python is an interpreted, interactive, object-oriented programming language and was first published in 1991. Python design philosophy emphasized on readability of the code and easy-to-use syntax. The programming language comes with a standard library along with an extensive list of third-party packages, and is in part responsible for its popularity. On Frontenac cluster, python is available as a module. You can load the default version of the Python using module load command.

 $ module load python

Often you are required to load a specific version of python. Use the following command to load a specific version of python, say 3.7.4:

$ module load python/3.7.4 

All the available version of python on Frontenac cluster can be list using ‘module avail’ command, this list will include both python/2.x and /3.x versions.

$ module avail python 

Some commonly used python packages are available through scipy-stack environment modules. This includes: Numpy, Scipy, Matplotlib,IPython, pandas, Sympy and nose. Similar to the python modules, different versions of the Scipy modules are available. The latest version of the Scipy-stack(scipy-stack/2019a) can be loaded with the following command:

$ module load scipy-stack 

Installation of python packages using virtualenv

Often, you might require additional packages other than the ones available in scipy-stack. Given the diverse user requirements, it is not practical to install all the necessary packages in a central stack. Python modules on our cluster comes with a virtualenv tool to install and manage packages necessary for your project. Typically, we recommend creating virtual environment in your $HOME directory. Alternatively, you can also install it in $PROJECT directory to make it available for other members of your group. As an example, the following steps shows the installation of biopython package in your $HOME/.local directory using virtualenv.

Steps for installation

Step-0: Change the current directory to /path/to/install/env

$ cd $HOME/.local     ### If this folder doesn’t exist, create the folder using  
                      ### mkdir –P $HOME/.local & cd $HOME/.local 

Step-1: Load the required python version for your packages. For biopython, we need either python/3.6, 3.7 or 3.8. We will use python/3.8.0 here

$ module load python/3.8.0

Step-2: Create and active a virtual environment ENV in the current directory

$ virtualenv –-no-download ENV 
$ source ENV/bin/activate 

Step-3: Upgrade ‘pip’ in your virtual environment

 
(ENV) $ pip install –-no-index --upgrade pip 

Step-4: Determine the dependencies for these packages. For biopython/x.y, we are required to have a C compiler along with numpy. Note that you might require to install additional optional packages and please visit https://biopython.org/wiki/Download for more information

(ENV) $ pip install numpy --no-index ## Alternatively, you can module load scipy-stack for numpy 
(ENV) $ pip install biopython --no-deps  

Step-5: Once the installation is complete you can deactivate your virtual environment

(ENV) $ deactivate  
$ 

Note: If you have used Python in on your local computer, you might be familiar with Anaconda for managing python environment and installing the required packages. But we strongly suggest against using anaconda on a hpc cluster for the following reasons. First, Anaconda installation can use a significant portion of your INODES quota and disk space. Second, Anaconda installs software and libraries that are already available on our software stack and are not necessarily optimized for our cluster. But, if needed, you can install mini-conda that comes with a conda toolbox to manage your python environment in your $HOME directory. But please note that we do not necessarily provide user support for conda and recommend using Virtualenv tool, instead.

Note: If you insist on using Conda/Anaconda you will be on your own as Conda is not supported

Using virtual environment

You are only required to a create virtualenv (ENV, for example) once to install the necessary packages for the project. Once the virtual environment is created in your local directory ($HOME or Project), you can activate the virtual environment ‘ENV’ by adding the following lines to your batch script.

module load python/3.8.0 
source $HOME/.local/ENV/bin/activate 
python …   ## Run your python application using virtual environments ENV 
…	   ## Other lines of your script 
deactivate 

Note: Based on your project needs, you can either modify your virtual environment by installing new packages to existing environment or create more than one virtualenv in your local directory.