Software:Frontenac

From CAC Wiki
Jump to: navigation, search

The Frontenac cluster includes a wide variety of software and compilers. There are several new ways of accessing and using software, which are documented here.

The module system

Frontenac uses the Lmod module system as does the Alliance clusters and many other HPC clusters around the world.

On a large compute cluster, it is impossible to have all sets of software loaded all the time by default. Some software has multiple versions, some packages conflict with each other, and some pieces of software need to be configured separately for different use cases. Environment modules are designed to solve this problem, by treating each software package and all of its associated files as a distinct package to be loaded on demand. Modules also handle the loading of dependencies. For instance, loading the R programming language would be done by loading the "r" module - any dependencies would be handled behind the scenes by the module system without any user intervention.

Note: The default Stanard Environment on the Frontenac cluster is StdEnv/2016.4, however if you prefer, you can set this to the more modern StdEnv/2020 automatically on login by creating a .modulerc file in your $HOME directory. In this file, add a line with

module-version StdEnv/2020 default


How to use the module system

What you want to do Lmod command (Frontenac cluster)
See all available software module avail
See a short description of what each package does module spider
Load the software package "packageName" module load packageName
Use a specific version of a software package module load packageName/version
View currently loaded packages module list
Unload a package module unload packageName
Unload all packages module purge

Please note that all commands are case-sensitive. For an extremely comprehensive set of documenation on using the module system (such as how to write your own modules), you can refer to the official Lmod documentation here: http://lmod.readthedocs.io/en/latest/

Local vs. Alliance software

Software on the Frontenac cluster can come from two locations: locally or from Alliance's centralized software stack. The Alliance software stack is standardized, and contains a set of software that is identically compiled and setup across every cluster it is installed on. This is a fantastic tool for reproducibility and scaling your work across multiple clusters: the same software will work the same way, regardless of where you are using it. There is also a large amount of locally installed software as well. This is how most software requiring licensing or other special local considerations is installed. Using both sets of software is identical- just run module load softwareName.

Please note that if you are the first user to use a Alliance software package on a node (or it has not been used in some time), the software may initially appear to "hang" and do nothing for several seconds on launch. This is normal - the software is being re-downloaded and cached on the local system. To tell if a piece of software being used is coming from this centralized stack, you can run which <some_command>. If the output begins with /cvmfs, it is part of the Alliance software stack.

List of all installed software

For a reasonably up-to date list of installed software, please check the Alliance software page here. The most up-to-date list will be the module system itself. To see all available packages that do not conflict with the current environment, run the module avail command. Typically packages not compatible with the default environment will be hidden (such as those compiled using the GNU compilers instead of the Intel compiler defaults).

Note that if you just want to load the default version of a piece of software you do not need to include the version when loading a module. For instance, module load gcc will load the default version of GCC (version 5.4.0).

Finding a specific package with "module spider"

To check if a package exists (compatible or not), use module spider packagename. This will typically load a list of package versions, and instructions on how to load each (dependencies may change between versions). Here's an example of how a user might find and use the OpenCV library.

First the user looks for packages. module avail shows way too many packages, and module avail opencv shows no match. One nice thing to note here is that module names are always lowercase.

[user@caclogin02 ~]$ module avail

-------------------------------------------------- /global/software/lmod/modules --------------------------------------------------
   abaqus/2017       (phys,D)    bayescan/g++540                   flexbar/3.0.3                   hwt/hwt_7.2.gnu
   adf/2017_108      (chem)      chapel/1.15.0         (t)         freesurfer/6.0.0                hwt/hwt_7.2.intel (D)
   afni/17.3.05                  cisc875/default                   fsl/5.0.10          (D)         ics/2017u1
   agouti/v0.3.3                 comsol/53             (phys,D)    gaussian/g09e1_sse4 (chem)      matlab/R2017a     (t)
   allpaths-lg/52488 (bio)       dotnet/1.0.4                      gaussian/g16a3_sse4 (chem)      pyrx/094
   anaconda/2.7.13               dotnet/1.1.4                      gaussian/g16b1_sse4 (chem,D)    qiime/1.9.1
   anaconda/3.5.3    (D)         dotnet/2.0.0          (D)         gurobi/752                      redundans/a6621dc
   ansys/ansys181    (phys,D)    faststructure/default             hisat2/2.1.0        (bio,D)

--------------------------------------------------- MPI-dependent avx2 modules ----------------------------------------------------
   abinit/8.2.2               (chem)      lumpy/0.2.13                 (bio)       plumed/2.3.2          (chem,D)
   abinit/8.4.4               (chem,D)    meep/1.3                     (phys)      pnetcdf/1.8.1         (io)
   abyss/1.5.2                (bio)       mpe2/2.4.9b                  (m)         psi4/1.1              (chem)
# more output omitted for brevity

[user@caclogin02 ~]$ module avail opencv
No modules found!
Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

Instead, the user should use module spider opencv to find if the OpenCV module is present.

[user@caclogin02 ~]$ module spider opencv

-------------------------------------------------------------------------------------------------------------------------------
  opencv:
-------------------------------------------------------------------------------------------------------------------------------
    Description:
      OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library.
      OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of
      machine perception in the commercial products.

     Versions:
        opencv/2.4.13.3
        opencv/3.3.0

-------------------------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "opencv" module (including how to load the modules) use the module's full name.
  For example:

     $ module spider opencv/3.3.0
-------------------------------------------------------------------------------------------------------------------------------

This command shows multiple versions, we'll get how to use the latest 3.3.0 version:

[user@caclogin02 ~]$ module spider opencv/3.3.0

-------------------------------------------------------------------------------------------------------------------------------
  opencv: opencv/3.3.0
-------------------------------------------------------------------------------------------------------------------------------
    Description:
      OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library.
      OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of
      machine perception in the commercial products.

    Properties:
      Visualisation software / Logiciels de visualisation

    You will need to load all module(s) on any one of the lines below before the "opencv/3.3.0" module is available to load.

      nixpkgs/16.09  gcc/5.4.0
      nixpkgs/16.09  gcc/5.4.0  cuda/8.0.44
 
    Help:
      
      Description
      ===========
      OpenCV (Open Source Computer Vision Library) is an open source computer vision
       and machine learning software library. OpenCV was built to provide
       a common infrastructure for computer vision applications and to accelerate
       the use of machine perception in the commercial products.
      
      
      More information
      ================
       - Homepage: http://opencv.org/

In this case, it looks like nixpkgs and gcc are the two dependencies for OpenCV. The default intel module is incompatible with gcc which is why these packages are hidden by default. Let's try loading the dependencies and then the "opencv" module.

[user@caclogin02 ~]$ module load nixpkgs/16.09  gcc/5.4.0

Lmod is automatically replacing "intel/2016.4" with "gcc/5.4.0".


Due to MODULEPATH changes, the following have been reloaded:
  1) openmpi/2.1.1     2) r/3.4.3

[user@caclogin02 ~]$ module load opencv

The OpenCV module now works as advertised. (We'll demonstrate using the OpenCV Python library)

[user@caclogin02 ~]$ module load python

The following have been reloaded with a version change:
  1) python/3.5.2 => python/3.5.4

[user@caclogin02 ~]$ python3
Python 3.5.4 (default, Dec  4 2017, 16:30:40) 
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'3.3.0'
>>> 

To sum things up, the user could access the opencv module with the following in their jobs:

# nixpkgs is always loaded unless you completely unload the software stack, 
# and gcc/5.4.0 is the default version of GCC
module load gcc
module load opencv
module load python
python3 opencv_script.py

Completely unloading the software stack

The Compute Canada software stack provides a complete set of binaries, libraries, development headers, and other system utilities that might interfere with things if you want to compile against or use the default, "vanilla" Linux installation. To completely unload all traces of the Compute Canada software stack, you can use the following command:

[user@caclogin02 ~]$ module purge --force
[user@caclogin02 ~]$ module list
No modules loaded

To re-load the software stack:

[user@caclogin02 ~]$ module load StdEnv
[user@caclogin02 ~]$ module list

Currently Loaded Modules:
  1) nixpkgs/16.09   (S)   3) gcccore/.5.4.0    (H)   5) intel/2016.4    (t)      7) openmpi/2.1.1 (m)
  2) icc/.2016.4.258 (H)   4) ifort/.2016.4.258 (H)   6) imkl/11.3.4.258 (math)   8) StdEnv/2016.4 (S)

  Where:
   S:     Module is Sticky, requires --force to unload or purge
   m:     MPI implementations / Implémentations MPI
   math:  Mathematical libraries / Bibliothèques mathématiques
   t:     Tools for development / Outils de développement
   H:                Hidden Module

Software installation

There is nothing preventing users from installing their own software in their own home directly. Compile software normally, and it should work. It is not necessary to run "sudo make install" or "make install" steps, as this will typically attempt install software to the wrong location. Similarly, do not ask to run "sudo yum install"/"sudo apt install" commands on the cluster - this will install things only on the local machine, and will not take effect for jobs running on the worker nodes. Dependencies like libraries, R packages, etc. can either be installed on the cluster filesystem (your home directory, for instance) or through the Compute Canada build system. In the event that a dependency or library is not present, please contact us and we can install it for you on the cluster filesystem or Compute Canada software stack, as appropriate.

Language-specific instructions

C / C++ / Fortran

The default compilers are actually the Intel compiler suite. Though these typically give better performance than the GNU compilers, they do not necessarily work with all software packages. If you're running into compilation issues, trying the following compilers in order may work (generally the older the compiler, the more likely it is to work with old software).

  • intel/2016.4 - (this is the default)
  • gcc/5.4.0 - GNU compilers generally have better compatibility. This is the default GNU compiler.
  • gcc/4.8.5 - The same version shipped with CentOS/RHEL 7, generally should work with most software and supports C++11.
  • ics/2017u1 - We maintain an independent version of the Intel compilers not on the Compute Canada software stack. Unload the software stack with module force --purge and attempt to recompile using the ics module.
  • GCC 4.8.5 (System compilers) - Unload the software stack with module force --purge and attempt to recompile. You may need to install the required development headers and libraries. If the dependency list is long, you may want to look into installing software via EasyBuild (same build system as Compute Canada, but will take forever as it compiles everything from scratch).
  • Contact us - If you've reached this point, the software likely requires special attention. Contact us at cac.help@queensu.ca, and we can have a go at installing the software.

Python

To install Python packages, we recommend using virtualenv and pip. The "Scipy stack" of commonly used packages (numpy, pandas, matplotlib, IPython/Jupyter, etc.) can be loaded through the scipy-stack module. In most cases, the following command will be sufficient to install a package:

pip install --user packagename

Go

Loadable via the module system. To see versions available local to Frontenac,

module spider Go

R

Packages should be installed via install.packages(). If you are prompted to install packages in a personal library, type "yes". It is often easiest to specify the RStudio CRAN as a download mirror:

install.packages("packageName", repos="https://cran.rstudio.com")

For Bioconductor packages:

source("https://bioconductor.org/biocLite.R")
biocLite("packageName")

Perl

To install Perl modules to a local directory, use the following bash commands to create a localized install of whatever modules you may need. It's actually not as complicated as it looks.

# install local::lib
wget http://search.cpan.org/CPAN/authors/id/H/HA/HAARG/local-lib-2.000018.tar.gz
tar -xzf local-lib-2.000018.tar.gz
cd local-lib-2.000018
perl Makefile.PL --bootstrap
make test && make install

# setting up appropriate environment variables so that perl knows about our new ~/perl5/lib directory
cd ~                
echo 'eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"' >> ~/.bashrc
source ~/.bashrc

# check that local::lib is indeed installing to the right directory, you should see a bunch of paths beginning with ~/perl5/lib/perl5/ get printed out
perl -e 'print "@INC"'

Installing Perl modules from CPAN will now allow you install whatever you need. For an example of using CPAN to install BioPerl, see below. This part requires a bit of baby-sitting, just hit enter whenever a prompt comes up.

cpan
install CJFIELDS/BioPerl-1.6.924.tar.gz
q

Anaconda / Bioconda

One very useful software package for bioinformatics workflows (or just anything requiring a large variety and number of different software packages) is Anaconda. Anaconda ships a large amount of pre-compiled software that can be used indenpendently of the module system. We recommend installing a personal copy of Miniconda in your home directory, and starting from there. Bioinformatics users may also be interested in the Bioconda channel to install common bioinformatics packages. This typical takes up large number of files and space in your $HOME, and its not necessarily optimized for the cluster environment. We strongly recommend using virtualenv tools for install python packages.

Please note that we provide very limited support for Miniconda on Frontenac cluster

Software installation assistance

If you would like a piece of software installed globally as a module, please contact us at cac.help@queensu.ca. We will perform the software installation in such a way that it is available on all newer Compute Canada systems (Graham/Cedar/Frontenac/Niagara/etc.) or on the local cluster filesystem under /global/software if a global install is not possible due to licensing or installation constraints.

As a policy, we do not install R / Python / Perl packages globally. Doing so can cause severe dependency/versioning issues for other users on the cluster. In rare cases, we may install Python packages into one of the anaconda modules for convenience.