Difference between revisions of "Frontenac:Migration"
(→Migration Schedule) |
(→Migration Schedule) |
||
Line 86: | Line 86: | ||
The migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 3-4 week time period. The migration procedure has four steps: | The migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 3-4 week time period. The migration procedure has four steps: | ||
− | + | * '''1 - Initiation of migration process''' | |
** Contact PI to determine migration requirements and schedule a time. | ** Contact PI to determine migration requirements and schedule a time. | ||
** Contact all researchers and issue temporary credentials to the new cluster. | ** Contact all researchers and issue temporary credentials to the new cluster. | ||
** Create temporary account on new cluster with restricted access (sandbox). | ** Create temporary account on new cluster with restricted access (sandbox). | ||
− | + | * '''2 - Rolling rsync of user data''' | |
** Will be repeated until update requires less than 2 hrs | ** Will be repeated until update requires less than 2 hrs | ||
*** /home/hpcXXXX | *** /home/hpcXXXX | ||
Line 101: | Line 101: | ||
** This period can take up to 4 weeks. The old cluster is still fully accessible. | ** This period can take up to 4 weeks. The old cluster is still fully accessible. | ||
** Dedicated or associate hardware will also be moved and updated during this time period. | ** Dedicated or associate hardware will also be moved and updated during this time period. | ||
− | + | * '''3 - Final group migration, user lockout''' | |
** Necessary to ensure data integrity. | ** Necessary to ensure data integrity. | ||
** Final rsync on "quiet" data. | ** Final rsync on "quiet" data. | ||
** At the end, access to old cluster is blocked. | ** At the end, access to old cluster is blocked. | ||
** All jobs on old cluster are terminated. | ** All jobs on old cluster are terminated. | ||
− | + | * '''4 - Full access to new cluster.''' | |
** Home directory on new cluster becomes /global/home/hpcXXXX | ** Home directory on new cluster becomes /global/home/hpcXXXX | ||
** Feedback requested from user. | ** Feedback requested from user. |
Revision as of 18:45, 17 May 2017
Contents
Migrating to the new Frontenac (CAC) cluster
!!! This guide is seriously under construction. Please do not rely on anything you read here until this warning is removed !!!
This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").
Migration Q&A
|
Why migrate ?Our systems underwent a substantial refresh last year with the retirement of the Solaris-based M9000 systems, and their replacement by new X86/Intel based hardware. This hardware was largely added to the existing "SW cluster" and eventually replaced it completely. However, this gradual replacement did not address issues in the base structure of that cluster, such as an old scheduler system, and a less than cutting-edge file system. To enable our users to make efficient use of the new hardware, we decided that it is time for a re-design of our main compute cluster. Some of our storage components reach their "end of life" phase and will be retired within a year. Rather than permanently operating two separate clusters, we will gradually move both our users and the compute hardware from one cluster/network to the other. We will do so over the course of months to give individual users plenty of time to familiarize themselves with the new environment and "wrap up" their work on the old one, thus minimizing the impact on their research. However, in the interest of consistency, we can not make this process optional. We must move all our users to the new cluster by early 2018 when service contracts for the old components run out. |
What's Different ?The new cluster is based on a newer version of the same operating system "CentOS". We have replaced the scheduler by a "new generation" one called SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced our "environment management system" by the more powerful and standard "lmod". Here are the main changes in table format.
|
Migration ScheduleThe migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 3-4 week time period. The migration procedure has four steps:
|
Compiling CodeThe standard Fortran/C/C++ compilers differ between the Solaris and the Linux systems. The ones on the x86/Linux platform are discussed here. Here is a comparison in table form. Since there are two compilers (gnu and Intel) on the Linux platform, they are treated separately. The default is gnu. We also list the MPI - related commands for setup, compilation, and runtime.
Note that all programs that were running on the Solaris platform have to be re-compiled on Linux. Binaries are not compatible as they are based on different instruction sets. MPIOn both Solaris and Linux systems, the MPI distribution used is OpenMPI. On the Solaris platform this was integrated with the standard Studio compilers. On the Linux platform, two versions are in use:
All of these versions use the mpirun command to invoke the runtime environment. Check with which mpirun to see which version you are currently using. |
New SchedulerSchedulingBoth the "old" M9000 servers and the "new" SW (Linux) cluster use Sun Grid Engine as a scheduler. Please consult our Scheduler Help File for details about its usage. The following table gives an overview of the alterations that need to be made to a submission script if execution is to take place on the Linux production nodes, i.e. the "SW cluster".
Note that it is strongly suggested to lower the number of processes requested when submitting to the SW cluster. This is because the nodes are substantially smaller than then the M9000 servers, but provide greatly improved per-core performance. This means that even with half the core count, a speedup of 2-3 is likely. We have added some entries to the table describing modifications that apply only for submissions of jobs running the Computational Chemistry software Gaussian. For more details about this software, please consult our Gaussian Help File. Gaussian submissions go to a dedicated large node on the SW cluster that uses local scratch space to improve performance and avoid bandwidth issues with IO. HelpIf you have questions that you can't resolve by checking documentation, email to cac.help@queensu.ca. |