Frontenac:Migration

From CAC Wiki
Revision as of 19:34, 16 May 2017 by Hasch (Talk | contribs) (What's Different ?)

Jump to: navigation, search

Migrating to the new Frontenac (CAC) cluster

This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").

Migration Q&A

  • Q: Who migrates ?
A: Eventually, all of our users will migrate from the old SW (Linux) cluster to the new "Frontenac" (CAC) cluster
  • Q: Can I use my old "stuff" ?
A: Much of the old data and software will be usable on the new systems. However, the data will have to be copied over as the new systems use a separate file system, and cross access is not possible.
  • Q Do I have to re-compile ?
A: It is possible that you will have to re-compile some of the software you are using. We will assist you with this.
  • Q: Do I copy my files over myself ?
A: No. We will do this for you in the course of the coming months.
  • Q: Is this optional ?
A: Unfortunately not. We will move both user data and hardware according to a schedule.
  • Q: Can I decide when to move ?
A: To a degree. We are open to "early adopters". Once they have been moved we move the bulk of users according to our schedule.
  • Q: Will this disrupt my research ?
A: We will do our level best to keep disruptions to a minimum. We will give you a chance to "practise" on the new systems while you still have access to the old ones. Once you are on the new systems, access to the old ones will be cut to preserve data integrity.

Why migrate ?

Our systems underwent a substantial refresh last year with the retirement of the Solaris-based M9000 systems, and their replacement by new X86/Intel based hardware. This hardware was largely added to the existing "SW cluster" and eventually replaced it completely. However, this gradual replacement did not address issues in the base structure of that cluster, such as an old scheduler system, and a less than cutting-edge file system. To enable our users to make efficient use of the new hardware, we decided that it is time for a re-design of our main compute cluster.

What's Different ?

The new cluster is based on a newer version of the same operating system "CentOS". We have replaced the scheduler by a "new generation" one called SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced our "environment management system" by the more powerful and standard "lmod". Here are the main changes in table format.

Difference between "old" SW (Linux) and "new" CAC (Frontenac) clusters
new SW (Linux) cluster new CAC (Frontenac) cluster
Operating System CentOS 6 CentOS 7
File System type NFS GPFS
Scheduler Sun Grid Engine (SGE) SLURM
Software Manager usepackage lmod
Backup management  ??? Storage Management (HSM)

Migration Schedule

Compiling Code

The standard Fortran/C/C++ compilers differ between the Solaris and the Linux systems. The ones on the x86/Linux platform are discussed here. Here is a comparison in table form. Since there are two compilers (gnu and Intel) on the Linux platform, they are treated separately. The default is gnu. We also list the MPI - related commands for setup, compilation, and runtime.

Fortran/C/C++ Compilers Sparc/Solaris to x86/Linux
Sparc/Solaris x86/Linux (gnu) x86/Linux (Intel)
Name/Version Studio 12.4 Gnu gcc 4.4.7 Intel 12.1
Setup command none (default) none (default) use icsmpi
MPI setup none (default) use openmpi use icsmpi
Fortran / C / C++ compilers f90 / cc / CC gfortran / gcc / g++ ifort / icc / icpc
MPI compoiler wrappers mpif90 / mpicc / mpiCC mpif90 / mpicc / mpicxx mpiifort / mpiicc / mpiicpc
MPI runtime environment mpirun mpirun mpirun

Note that all programs that were running on the Solaris platform have to be re-compiled on Linux. Binaries are not compatible as they are based on different instruction sets.

MPI

On both Solaris and Linux systems, the MPI distribution used is OpenMPI. On the Solaris platform this was integrated with the standard Studio compilers. On the Linux platform, two versions are in use:

  • A stand-alone version of OpenMPI 1.8 is used in combination with the gcc compiler and setup through the use openmpi command.
  • A second version (Intel 4.0 update 3) is used with the Intel compilers and set up together with them ("use icsmpi")

All of these versions use the mpirun command to invoke the runtime environment. Check with which mpirun to see which version you are currently using.

New Scheduler

Scheduling

Both the "old" M9000 servers and the "new" SW (Linux) cluster use Sun Grid Engine as a scheduler. Please consult our Scheduler Help File for details about its usage. The following table gives an overview of the alterations that need to be made to a submission script if execution is to take place on the Linux production nodes, i.e. the "SW cluster".

Changes in SGE submissions when migrating from Sparc/Solaris to x86/Linux
Sparc/Solaris x86/Linux
Queue name m9k.q (old default, deprecated) abaqus.q (new default)
Node names m9k000* sw00**, cac0**
Login node for
submission
sflogin0 swlogin1
Rel. Serial Execution Speed 1 3-6
Suggested Relative Nprocs 1 1/2
Queue specification
in submit script
none none
Gaussian Parallel environment
#$ -pe gaussian.pe
#$ -pe glinux.pe
Gaussian Setup line
. /opt/gaussian/setup.sh
. /opt/gaussian/setup.sh

Note that it is strongly suggested to lower the number of processes requested when submitting to the SW cluster. This is because the nodes are substantially smaller than then the M9000 servers, but provide greatly improved per-core performance. This means that even with half the core count, a speedup of 2-3 is likely.

We have added some entries to the table describing modifications that apply only for submissions of jobs running the Computational Chemistry software Gaussian. For more details about this software, please consult our Gaussian Help File. Gaussian submissions go to a dedicated large node on the SW cluster that uses local scratch space to improve performance and avoid bandwidth issues with IO.

Help

If you have questions that you can't resolve by checking documentation, email to cac.help@queensu.ca.