Difference between revisions of "Software:Frontenac"
(→Anaconda / Bioconda) |
(→Language-specific instructions) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
= The module system = | = The module system = | ||
− | Frontenac uses the Lmod module system as does the | + | Frontenac uses the Lmod module system as does the Alliance clusters and many other HPC clusters around the world. |
On a large compute cluster, it is impossible to have all sets of software loaded all the time by default. Some software has multiple versions, some packages conflict with each other, and some pieces of software need to be configured separately for different use cases. Environment modules are designed to solve this problem, by treating each software package and all of its associated files as a distinct package to be loaded on demand. Modules also handle the loading of dependencies. For instance, loading the R programming language would be done by loading the "r" module - any dependencies would be handled behind the scenes by the module system without any user intervention. | On a large compute cluster, it is impossible to have all sets of software loaded all the time by default. Some software has multiple versions, some packages conflict with each other, and some pieces of software need to be configured separately for different use cases. Environment modules are designed to solve this problem, by treating each software package and all of its associated files as a distinct package to be loaded on demand. Modules also handle the loading of dependencies. For instance, loading the R programming language would be done by loading the "r" module - any dependencies would be handled behind the scenes by the module system without any user intervention. | ||
Line 36: | Line 36: | ||
Please note that all commands are ''case-sensitive''. For an extremely comprehensive set of documenation on using the module system (such as how to write your own modules), you can refer to the official Lmod documentation here: [http://lmod.readthedocs.io/en/latest/ http://lmod.readthedocs.io/en/latest/] | Please note that all commands are ''case-sensitive''. For an extremely comprehensive set of documenation on using the module system (such as how to write your own modules), you can refer to the official Lmod documentation here: [http://lmod.readthedocs.io/en/latest/ http://lmod.readthedocs.io/en/latest/] | ||
− | = Local vs. | + | = Local vs. Alliance software = |
− | Software on the Frontenac cluster can come from two locations: locally or from | + | Software on the Frontenac cluster can come from two locations: locally or from Alliance's centralized software stack. The Alliance software stack is standardized, and contains a set of software that is identically compiled and setup across every cluster it is installed on. This is a fantastic tool for reproducibility and scaling your work across multiple clusters: the same software will work the same way, regardless of where you are using it. There is also a large amount of locally installed software as well. This is how most software requiring licensing or other special local considerations is installed. Using both sets of software is identical- just run <code>module load softwareName</code>. |
− | Please note that if you are the first user to use a | + | Please note that if you are the first user to use a Alliance software package on a node (or it has not been used in some time), the software may initially appear to "hang" and do nothing for several seconds on launch. This is normal - the software is being re-downloaded and cached on the local system. To tell if a piece of software being used is coming from this centralized stack, you can run <code>which <some_command></code>. If the output begins with <code>/cvmfs</code>, it is part of the Alliance software stack. |
= List of all installed software = | = List of all installed software = | ||
− | For a reasonably up-to date list of installed software, '''please check the | + | For a reasonably up-to date list of installed software, '''please check the Alliance software page [https://docs.computecanada.ca/wiki/Available_software here]'''. The most up-to-date list will be the module system itself. To see all available packages that do not conflict with the current environment, run the <code>module avail</code> command. Typically packages not compatible with the default environment will be hidden (such as those compiled using the GNU compilers instead of the Intel compiler defaults). |
'''Note that if you just want to load the default version of a piece of software you do not need to include the version when loading a module. For instance, <code>module load gcc</code> will load the default version of GCC (version 5.4.0).''' | '''Note that if you just want to load the default version of a piece of software you do not need to include the version when loading a module. For instance, <code>module load gcc</code> will load the default version of GCC (version 5.4.0).''' | ||
Line 237: | Line 237: | ||
pip install --user packagename | pip install --user packagename | ||
</pre> | </pre> | ||
+ | |||
+ | === Go === | ||
+ | Loadable via the module system. To see versions available local to Frontenac, | ||
+ | <pre>module spider Go</pre> | ||
=== R === | === R === |
Latest revision as of 18:14, 2 August 2023
The Frontenac cluster includes a wide variety of software and compilers. There are several new ways of accessing and using software, which are documented here.
Contents
The module system
Frontenac uses the Lmod module system as does the Alliance clusters and many other HPC clusters around the world.
On a large compute cluster, it is impossible to have all sets of software loaded all the time by default. Some software has multiple versions, some packages conflict with each other, and some pieces of software need to be configured separately for different use cases. Environment modules are designed to solve this problem, by treating each software package and all of its associated files as a distinct package to be loaded on demand. Modules also handle the loading of dependencies. For instance, loading the R programming language would be done by loading the "r" module - any dependencies would be handled behind the scenes by the module system without any user intervention.
Note: The default Stanard Environment on the Frontenac cluster is StdEnv/2016.4, however if you prefer, you can set this to the more modern StdEnv/2020 automatically on login by creating a .modulerc file in your $HOME directory. In this file, add a line with
module-version StdEnv/2020 default
How to use the module system
What you want to do | Lmod command (Frontenac cluster) |
---|---|
See all available software | module avail |
See a short description of what each package does | module spider |
Load the software package "packageName" | module load packageName |
Use a specific version of a software package | module load packageName/version |
View currently loaded packages | module list |
Unload a package | module unload packageName |
Unload all packages | module purge |
Please note that all commands are case-sensitive. For an extremely comprehensive set of documenation on using the module system (such as how to write your own modules), you can refer to the official Lmod documentation here: http://lmod.readthedocs.io/en/latest/
Local vs. Alliance software
Software on the Frontenac cluster can come from two locations: locally or from Alliance's centralized software stack. The Alliance software stack is standardized, and contains a set of software that is identically compiled and setup across every cluster it is installed on. This is a fantastic tool for reproducibility and scaling your work across multiple clusters: the same software will work the same way, regardless of where you are using it. There is also a large amount of locally installed software as well. This is how most software requiring licensing or other special local considerations is installed. Using both sets of software is identical- just run module load softwareName
.
Please note that if you are the first user to use a Alliance software package on a node (or it has not been used in some time), the software may initially appear to "hang" and do nothing for several seconds on launch. This is normal - the software is being re-downloaded and cached on the local system. To tell if a piece of software being used is coming from this centralized stack, you can run which <some_command>
. If the output begins with /cvmfs
, it is part of the Alliance software stack.
List of all installed software
For a reasonably up-to date list of installed software, please check the Alliance software page here. The most up-to-date list will be the module system itself. To see all available packages that do not conflict with the current environment, run the module avail
command. Typically packages not compatible with the default environment will be hidden (such as those compiled using the GNU compilers instead of the Intel compiler defaults).
Note that if you just want to load the default version of a piece of software you do not need to include the version when loading a module. For instance, module load gcc
will load the default version of GCC (version 5.4.0).
Finding a specific package with "module spider"
To check if a package exists (compatible or not), use module spider packagename
. This will typically load a list of package versions, and instructions on how to load each (dependencies may change between versions). Here's an example of how a user might find and use the OpenCV library.
First the user looks for packages. module avail
shows way too many packages, and module avail opencv
shows no match. One nice thing to note here is that module names are always lowercase.
[user@caclogin02 ~]$ module avail -------------------------------------------------- /global/software/lmod/modules -------------------------------------------------- abaqus/2017 (phys,D) bayescan/g++540 flexbar/3.0.3 hwt/hwt_7.2.gnu adf/2017_108 (chem) chapel/1.15.0 (t) freesurfer/6.0.0 hwt/hwt_7.2.intel (D) afni/17.3.05 cisc875/default fsl/5.0.10 (D) ics/2017u1 agouti/v0.3.3 comsol/53 (phys,D) gaussian/g09e1_sse4 (chem) matlab/R2017a (t) allpaths-lg/52488 (bio) dotnet/1.0.4 gaussian/g16a3_sse4 (chem) pyrx/094 anaconda/2.7.13 dotnet/1.1.4 gaussian/g16b1_sse4 (chem,D) qiime/1.9.1 anaconda/3.5.3 (D) dotnet/2.0.0 (D) gurobi/752 redundans/a6621dc ansys/ansys181 (phys,D) faststructure/default hisat2/2.1.0 (bio,D) --------------------------------------------------- MPI-dependent avx2 modules ---------------------------------------------------- abinit/8.2.2 (chem) lumpy/0.2.13 (bio) plumed/2.3.2 (chem,D) abinit/8.4.4 (chem,D) meep/1.3 (phys) pnetcdf/1.8.1 (io) abyss/1.5.2 (bio) mpe2/2.4.9b (m) psi4/1.1 (chem) # more output omitted for brevity [user@caclogin02 ~]$ module avail opencv No modules found! Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
Instead, the user should use module spider opencv
to find if the OpenCV module is present.
[user@caclogin02 ~]$ module spider opencv ------------------------------------------------------------------------------------------------------------------------------- opencv: ------------------------------------------------------------------------------------------------------------------------------- Description: OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Versions: opencv/2.4.13.3 opencv/3.3.0 ------------------------------------------------------------------------------------------------------------------------------- For detailed information about a specific "opencv" module (including how to load the modules) use the module's full name. For example: $ module spider opencv/3.3.0 -------------------------------------------------------------------------------------------------------------------------------
This command shows multiple versions, we'll get how to use the latest 3.3.0 version:
[user@caclogin02 ~]$ module spider opencv/3.3.0 ------------------------------------------------------------------------------------------------------------------------------- opencv: opencv/3.3.0 ------------------------------------------------------------------------------------------------------------------------------- Description: OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Properties: Visualisation software / Logiciels de visualisation You will need to load all module(s) on any one of the lines below before the "opencv/3.3.0" module is available to load. nixpkgs/16.09 gcc/5.4.0 nixpkgs/16.09 gcc/5.4.0 cuda/8.0.44 Help: Description =========== OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. More information ================ - Homepage: http://opencv.org/
In this case, it looks like nixpkgs
and gcc
are the two dependencies for OpenCV. The default intel
module is incompatible with gcc
which is why these packages are hidden by default. Let's try loading the dependencies and then the "opencv" module.
[user@caclogin02 ~]$ module load nixpkgs/16.09 gcc/5.4.0 Lmod is automatically replacing "intel/2016.4" with "gcc/5.4.0". Due to MODULEPATH changes, the following have been reloaded: 1) openmpi/2.1.1 2) r/3.4.3 [user@caclogin02 ~]$ module load opencv
The OpenCV module now works as advertised. (We'll demonstrate using the OpenCV Python library)
[user@caclogin02 ~]$ module load python The following have been reloaded with a version change: 1) python/3.5.2 => python/3.5.4 [user@caclogin02 ~]$ python3 Python 3.5.4 (default, Dec 4 2017, 16:30:40) [GCC 5.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import cv2 >>> cv2.__version__ '3.3.0' >>>
To sum things up, the user could access the opencv
module with the following in their jobs:
# nixpkgs is always loaded unless you completely unload the software stack, # and gcc/5.4.0 is the default version of GCC module load gcc module load opencv module load python python3 opencv_script.py
Completely unloading the software stack
The Compute Canada software stack provides a complete set of binaries, libraries, development headers, and other system utilities that might interfere with things if you want to compile against or use the default, "vanilla" Linux installation. To completely unload all traces of the Compute Canada software stack, you can use the following command:
[user@caclogin02 ~]$ module purge --force [user@caclogin02 ~]$ module list No modules loaded
To re-load the software stack:
[user@caclogin02 ~]$ module load StdEnv [user@caclogin02 ~]$ module list Currently Loaded Modules: 1) nixpkgs/16.09 (S) 3) gcccore/.5.4.0 (H) 5) intel/2016.4 (t) 7) openmpi/2.1.1 (m) 2) icc/.2016.4.258 (H) 4) ifort/.2016.4.258 (H) 6) imkl/11.3.4.258 (math) 8) StdEnv/2016.4 (S) Where: S: Module is Sticky, requires --force to unload or purge m: MPI implementations / Implémentations MPI math: Mathematical libraries / Bibliothèques mathématiques t: Tools for development / Outils de développement H: Hidden Module
Software installation
There is nothing preventing users from installing their own software in their own home directly. Compile software normally, and it should work. It is not necessary to run "sudo make install" or "make install" steps, as this will typically attempt install software to the wrong location. Similarly, do not ask to run "sudo yum install"/"sudo apt install" commands on the cluster - this will install things only on the local machine, and will not take effect for jobs running on the worker nodes. Dependencies like libraries, R packages, etc. can either be installed on the cluster filesystem (your home directory, for instance) or through the Compute Canada build system. In the event that a dependency or library is not present, please contact us and we can install it for you on the cluster filesystem or Compute Canada software stack, as appropriate.
Language-specific instructions
C / C++ / Fortran
The default compilers are actually the Intel compiler suite. Though these typically give better performance than the GNU compilers, they do not necessarily work with all software packages. If you're running into compilation issues, trying the following compilers in order may work (generally the older the compiler, the more likely it is to work with old software).
- intel/2016.4 - (this is the default)
- gcc/5.4.0 - GNU compilers generally have better compatibility. This is the default GNU compiler.
- gcc/4.8.5 - The same version shipped with CentOS/RHEL 7, generally should work with most software and supports C++11.
- ics/2017u1 - We maintain an independent version of the Intel compilers not on the Compute Canada software stack. Unload the software stack with
module force --purge
and attempt to recompile using theics
module. - GCC 4.8.5 (System compilers) - Unload the software stack with
module force --purge
and attempt to recompile. You may need to install the required development headers and libraries. If the dependency list is long, you may want to look into installing software via EasyBuild (same build system as Compute Canada, but will take forever as it compiles everything from scratch). - Contact us - If you've reached this point, the software likely requires special attention. Contact us at cac.help@queensu.ca, and we can have a go at installing the software.
Python
To install Python packages, we recommend using virtualenv
and pip
. The "Scipy stack" of commonly used packages (numpy, pandas, matplotlib, IPython/Jupyter, etc.) can be loaded through the scipy-stack
module. In most cases, the following command will be sufficient to install a package:
pip install --user packagename
Go
Loadable via the module system. To see versions available local to Frontenac,
module spider Go
R
Packages should be installed via install.packages()
. If you are prompted to install packages in a personal library, type "yes". It is often easiest to specify the RStudio CRAN as a download mirror:
install.packages("packageName", repos="https://cran.rstudio.com")
For Bioconductor packages:
source("https://bioconductor.org/biocLite.R") biocLite("packageName")
Perl
To install Perl modules to a local directory, use the following bash commands to create a localized install of whatever modules you may need. It's actually not as complicated as it looks.
# install local::lib wget http://search.cpan.org/CPAN/authors/id/H/HA/HAARG/local-lib-2.000018.tar.gz tar -xzf local-lib-2.000018.tar.gz cd local-lib-2.000018 perl Makefile.PL --bootstrap make test && make install # setting up appropriate environment variables so that perl knows about our new ~/perl5/lib directory cd ~ echo 'eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"' >> ~/.bashrc source ~/.bashrc # check that local::lib is indeed installing to the right directory, you should see a bunch of paths beginning with ~/perl5/lib/perl5/ get printed out perl -e 'print "@INC"'
Installing Perl modules from CPAN will now allow you install whatever you need. For an example of using CPAN to install BioPerl, see below. This part requires a bit of baby-sitting, just hit enter whenever a prompt comes up.
cpan install CJFIELDS/BioPerl-1.6.924.tar.gz q
Anaconda / Bioconda
One very useful software package for bioinformatics workflows (or just anything requiring a large variety and number of different software packages) is Anaconda. Anaconda ships a large amount of pre-compiled software that can be used indenpendently of the module system. We recommend installing a personal copy of Miniconda in your home directory, and starting from there. Bioinformatics users may also be interested in the Bioconda channel to install common bioinformatics packages. This typical takes up large number of files and space in your $HOME, and its not necessarily optimized for the cluster environment. We strongly recommend using virtualenv tools for install python packages.
Please note that we provide very limited support for Miniconda on Frontenac cluster
Software installation assistance
If you would like a piece of software installed globally as a module, please contact us at cac.help@queensu.ca. We will perform the software installation in such a way that it is available on all newer Compute Canada systems (Graham/Cedar/Frontenac/Niagara/etc.) or on the local cluster filesystem under /global/software if a global install is not possible due to licensing or installation constraints.
As a policy, we do not install R / Python / Perl packages globally. Doing so can cause severe dependency/versioning issues for other users on the cluster. In rare cases, we may install Python packages into one of the anaconda
modules for convenience.