Latest revision as of 15:41, 8 February 2018

Quickstart

If your account has been migrated to Frontenac, you can gain access to the new system by following the steps below.

Visit https://login.cac.queensu.ca/pwr to obtain a temporary password. You must use the original email you registered with.

Logon to the new system using a SSH client (MobaXterm on Windows, Terminal on macOS/Linux): ssh yourUsername@login.cac.queensu.ca. The first time you login, the system will prompt you to change your temporary password, then log you out (so you can test logging in with the new password).

A set of guides on how to:

Migrating to the new Frontenac cluster

This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").

Note: We are in the final phase of the migration process. All users will gain access to the new systems by mid-November, and lose acess to the old systems in early January 2018. Scheduling of new jobs on the old system will stop in mid-December! Please make yourself familiar with the new systems.

Why migrate ?

Our systems underwent a substantial refresh last year with the retirement of the Solaris-based M9000 systems, and their replacement by new X86/Intel based hardware. This hardware was largely added to the existing "SW cluster" and eventually replaced it completely. However, this gradual replacement did not address issues in the base structure of that cluster, such as an old scheduler system, and a less than cutting-edge file system. To enable our users to make efficient use of the new hardware, we decided that it is time for a re-design of our main compute cluster. Some of our storage components reach their "end of life" phase and will be retired within a year.

Rather than permanently operating two separate clusters, we will move both our users and the compute hardware from one cluster/network to the other. In the interest of consistency, we can not make this process optional. We must move all our users to the new cluster by early 2018 when service contracts for the old components run out.

What's Different ?

The new cluster is based on a newer version of the same CentOS operating system. We have replaced the scheduler with SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced the "use" system by the more powerful and standard "lmod". Here are the main changes in table format.

	new SW (Linux) cluster	new CAC (Frontenac) cluster
Operating system	CentOS 6	CentOS 7
File system type	ZFS	GPFS
Scheduler	Sun Grid Engine (SGE)	SLURM
Software manager	usepackage	lmod
Backup management	samfs	Hierarchical Storage Management (HSM)

Migration Time Table

Different users will migrate at different times. We have been moving data to the new file system for months, so that at the time when "it's your turn" your data will already be available on the new system. Here is a month-by-month outline of who will move when. If you want to migrate ahead of schedule, or you have compelling reasons to delay the move, please get in touch with us at cac.help@queensu.ca

Month (2017)	Who moves ?
September	De-actived users User who have not run a scheduled job for > 6 months Volunteers
October	New accounts (i.e. new users will be going straight to Frontenac) User who have not run a scheduled job for > 3 months Volunteers
November	New accounts (i.e. new users will be going straight to Frontenac)
December	New accounts (i.e. new users will be going straight to Frontenac) Everyone

We will transfer hardware from the "old" cluster (SW) to the new one (Frontenac) to accommodate the migrated users. This means that in the transition period, the old cluster will gradually become smaller while the new one grows. Dedicated hardware will be moved when its users migrate.

IMPORTANT DEADLINES

In the final phase of the migration process, all users receive a notification email and are asked to make themselves familiar wit the new systems. Here is a list of important dates that our users should keep in mind when planning to use our systems in the time period between November 2017 and February 2018.

Date	Migration Event	System
November 6, 2017	Scheduling halted for all nodes with more than 24 cores	SW ("old system")
December 1, 2017	User notification by email All users receive access to new systems	Frontenac ("new system")
January 3, 2017	Data synchronization stops User data that differ after this date must be transferred by users Grid Engine scheduling disabled (nodes "draining")	SW ("old system")
January 19, 2018	All running jobs are terminated Remaining hardware is transferred to new system	SW ("old system")
January 26, 2018	User access to sflogin0/swlogin1 closed SNOlab (SX) cluster jobs terminated SNOlab (SX) login nodes closed	SW ("old system")

Until year-end, we are continuously "syncing" user data from the old to the new systems. Note that these are two independent copies of the data. This synchronization stops after January 3, 2018. After this date, it is the responsibility of the user to transfer data from the old to the new system if desired. If you encounter inconsistencies and need assistance, please contact us.

Migration Schedule

The migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 1-month week time period. The migration procedure has three steps:

1 - Initiation of migration process
- Email notification of the user (mid-November).
- Create account on new cluster.
- Issue temporary credentials to the new cluster and request initial login to change password.
2 - Rolling rsync of user data
- Will be repeated until update requires less than 2 hrs
  - /home/hpcXXXX
  - /u1/work/hpcXXXX
  - /scratch/hpcXXXX if required
  - other directories if required
- Users can access both new and old systems for 1 month.
  - Data on the old system that are newer than on the new one are rsync'ed.
3 - Final migration
- Final rsync.
- Jobs on old cluster are terminated.
- User access to old system closed.

Migration Q&A

Q: Who migrates ?

A: All of our users will migrate from the old SW cluster to the new "Frontenac" cluster

Q: Can I use my old "stuff" ?

A: Much of the old data and software will be usable on the new systems. However, the data will have to be copied over as the new systems use a separate file system, and cross access is not possible.

Q Do I have to re-compile ?

A: It is possible that you will have to re-compile some of the software you are using. We will assist you with this.

Q: Do I copy my files over myself ?

A: Initially, we transfer your data for you. This synchronization process will end on December 15. If you are still altering your data after this date, it is your responsibility to transfer the data manually.

Q: Is this optional ?

A: No. We move both user data and hardware according to a schedule.

Q: Can I decide when to move ?

A: We are open to "early adopters", but we cannot grant extensions on the old systems.

Q: Will this disrupt my research ?

A: The moving of hardware and users causes unavoidable scheduling bottlenecks, as substantial portions of the clusters have to be kept inactive to "drain". Also, in the intermediate period when one cluster is dismantled and the other is being built up, both are substantially smaller. Especially larger jobs will be hard or impossible to schedule in the period between November'17 and February'18.

Q: How are resources allocated on the new cluster ?

A: Pleased read through our help file "Resource Allocations on Frontenac"

Help

If you have questions that you can't resolve by checking documentation, email to cac.help@queensu.ca.

Difference between revisions of "Frontenac:Migration"

Latest revision as of 15:41, 8 February 2018

Contents

Quickstart

Migrating to the new Frontenac cluster

Why migrate ?

What's Different ?

Migration Time Table

IMPORTANT DEADLINES

Migration Schedule

Migration Q&A

Help

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
-= Migrating to the new '''Frontenac''' (CAC) cluster =
+= '''Quickstart''' =
-'''!!! This guide is seriously under construction. Please do not rely on anything you read here until this warning is removed !!!'''
+If your account has been migrated to Frontenac, you can gain access to the new system by following the steps below.
-This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").
+* Visit https://login.cac.queensu.ca/pwr to obtain a temporary password. You must use the original email you registered with.
-{|  style="border-spacing: 8px;"
+* Logon to the new system using a SSH client (MobaXterm on Windows, Terminal on macOS/Linux): <code>ssh yourUsername@login.cac.queensu.ca</code>. The first time you login, the system will prompt you to change your temporary password, then log you out (so you can test logging in with the new password).
-| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
-== Migration Q&A ==
+'''A set of guides on how to:'''
-* '''Q''': Who migrates ?
+* [[Access:Frontenac|... log into the system]]
-: '''A''': Eventually, all of our users will migrate from the old SW (Linux) cluster to the new "Frontenac" (CAC) cluster
+* [[Software:Frontenac|... setup and use software]]
+* [[Filesystems:Frontenac|... find your way around the filesystems]]
+* [[SLURM|... submit jobs using SLURM]]
-* '''Q''': Can I use my old "stuff" ?
+= Migrating to the new Frontenac cluster =
-: '''A''': Much of the old data and software will be usable on the new systems. However, the data will have to be copied over as the new systems use a separate file system, and cross access is not possible.
-* '''Q''' Do I have to re-compile ?
+This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").
-: '''A''': It is possible that you will have to re-compile some of the software you are using. We will assist you with this.
-* '''Q''': Do I copy my files over myself ?
-: '''A''': No. We will do this for you in the course of the coming months.
-* '''Q''': Is this optional ?
-: '''A''': Unfortunately not. We will move both user data and hardware according to a schedule.
-* '''Q''': Can I decide when to move ?
-: '''A''': To a degree. We are open to "early adopters". Once they have been moved we move the bulk of users according to our schedule.
-* '''Q''': Will this disrupt my research ?
-: '''A''': We will do our level best to keep disruptions to a minimum. We will give you a chance to "practise" on the new systems while you still have access to the old ones. Once you are on the new systems, access to the old ones will be cut to preserve data integrity.
-|}
-{|  style="border-spacing: 8px;"
+'''Note: We are in the final phase of the migration process. All users will gain access to the new systems by mid-November, and lose acess to the old systems in early January 2018. Scheduling of new jobs on the old system will stop in mid-December! Please make yourself familiar with the new systems.'''
-| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
 == Why migrate ? ==
@@ Line 39: / Line 24: @@
 Our systems underwent a substantial refresh last year with the retirement of the Solaris-based M9000 systems, and their replacement by new X86/Intel based hardware. This hardware was largely added to the existing "SW cluster" and eventually replaced it completely. However, this gradual replacement did not address issues in the base structure of that cluster, such as an old scheduler system, and a less than cutting-edge file system. To enable our users to make efficient use of the new hardware, we decided that it is time for a re-design of our main compute cluster. Some of our storage components reach their "end of life" phase and will be retired within a year.
-Rather than permanently operating two separate clusters, we will gradually move both our users and the compute hardware from one cluster/network to the other. We will do so over the course of months to give individual users plenty of time to familiarize themselves with the new environment and "wrap up" their work on the old one, thus minimizing the impact on their research. However, in the interest of consistency, we can not make this process optional. '''We must move all our users to the new cluster by early 2018''' when service contracts for the old components run out.
+Rather than permanently operating two separate clusters, we will move both our users and the compute hardware from one cluster/network to the other. In the interest of consistency, we can not make this process optional. '''We must move all our users to the new cluster by early 2018''' when service contracts for the old components run out.
-|}
-{|  style="border-spacing: 8px;"
-| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
 == What's Different ? ==
-The new cluster is based on a newer version of the same operating system "CentOS". We have replaced the scheduler by a "new generation" one called SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced our "environment management system" by the more powerful and standard "lmod". Here are the main changes in table format.
+The new cluster is based on a newer version of the same CentOS operating system. We have replaced the scheduler with SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced the "use" system by the more powerful and standard "lmod". Here are the main changes in table format.
-{| class="wikitable" style="float:left; margin-right: 25px;"
+{| class="wikitable" | '''Difference between "old" SW (Linux) and "new" CAC (Frontenac) clusters'''
-!colspan="4"| '''Difference between "old" SW (Linux) and "new" CAC (Frontenac) clusters'''
 |-
 |
@@ Line 57: / Line 36: @@
 |'''new CAC (Frontenac) cluster'''
 |-
-| '''Operating System'''
+| '''Operating system'''
 | CentOS 6
-| CentOS 7
+| [https://wiki.centos.org/ CentOS] 7
 |-
-| '''File System type'''
+| '''File system type'''
-| NFS
+| ZFS
 | [https://www.ibm.com/support/knowledgecenter/en/SSFKCN/gpfs_welcome.html GPFS]
 |-
@@ Line 69: / Line 48: @@
 | [https://slurm.schedmd.com/ SLURM]
 |-
-| '''Software Manager'''
+| '''Software manager'''
 | usepackage
 | [https://lmod.readthedocs.io/en/latest/ lmod]
 |-
 | '''Backup management'''
-| ???
+| samfs
 | [https://en.wikipedia.org/wiki/Hierarchical_storage_management Hierarchical Storage Management (HSM)]
 |}
-|}
+== Migration Time Table ==
-{|  style="border-spacing: 8px;"
-| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
-== Migration Schedule ==
+Different users will migrate at different times. We have been moving data to the new file system for months, so that at the time when "it's your turn" your data will already be available on the new system. Here is a month-by-month outline of who will move when. If you want to migrate ahead of schedule, or you have compelling reasons to delay the move, please get in touch with us at cac.help@queensu.ca
-The migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 3-4 week time period. The migration procedure has four steps:
+{| class="wikitable" | '''Difference between "old" SW (Linux) and "new" CAC (Frontenac) clusters'''
+|-
-* '''1 - Initiation of migration process'''
+|'''Month (2017)'''
-** Contact PI to determine migration requirements and schedule a time.
+|'''Who moves ?'''
-** Contact all researchers and issue temporary credentials to the new cluster.
+|-
-** Create temporary account on new cluster with restricted access (sandbox).
+| September
-* '''2 - Rolling rsync of user data'''
+|
-** Will be repeated until update requires less than 2 hrs
+* De-actived users
-*** /home/hpcXXXX
+* User who have not run a scheduled job for > 6 months
-*** /u1/work/hpcXXXX
+* Volunteers
-*** /scratch/hpcXXXX ''if required''
+|-
-*** other directories ''if required''
+| October
-** Users access the new systems through a "sandbox" home directory
+|
-*** /global/migration/hpcXXXX
+* New accounts (i.e. new users will be going straight to Frontenac)
-*** This area is temporary and will be deleted at the end of the migration.
+* User who have not run a scheduled job for > 3 months
-** This period can take up to 4 weeks. The old cluster is still fully accessible.
+* Volunteers
-** Dedicated or associate hardware will also be moved and updated during this time period.
+|-
-* '''3 - Final group migration, user lockout'''
+| November
-** Necessary to ensure data integrity.
+|
-** Final rsync on "quiet" data.
+* New accounts (i.e. new users will be going straight to Frontenac)
-** At the end, access to old cluster is blocked.
+|-
-** All jobs on old cluster are terminated.
+| December
-* '''4 - Full access to new cluster.'''
+|
-** Home directory on new cluster becomes /global/home/hpcXXXX
+* New accounts (i.e. new users will be going straight to Frontenac)
-** Feedback requested from user.
+* '''Everyone'''
 |}
-{|  style="border-spacing: 8px;"
+We will transfer hardware from the "old" cluster (SW) to the new one (Frontenac) to accommodate the migrated users. This means that in the transition period, the old cluster will gradually become smaller while the new one grows. Dedicated hardware will be moved when its users migrate.
-| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
-== Compiling Code ==
+== '''IMPORTANT DEADLINES''' ==
-The standard Fortran/C/C++ compilers differ between the Solaris and the Linux systems. [[HowTo:Compilers|The ones on the x86/Linux platform are discussed here]]. Here is a comparison in table form. Since there are two compilers ('''gnu''' and '''Intel''') on the Linux platform, they are treated separately. The default is '''gnu'''. We also list the MPI - related commands for setup, compilation, and runtime.
+In the final phase of the migration process, all users receive a notification email and are asked to make themselves familiar wit the new systems. Here is a list of important dates that our users should keep in mind when planning to use our systems in the time period between November 2017 and February 2018.
+{| class="wikitable" | '''Important Migration Dates'''
-{| class="wikitable" style="float:left; margin-right: 25px;"
-!colspan="4"| '''Fortran/C/C++ Compilers Sparc/Solaris to x86/Linux'''
 |-
-|
+|'''Date'''
-|'''Sparc/Solaris'''
+|'''Migration Event'''
-|'''x86/Linux (gnu)'''
+|'''System'''
-|'''x86/Linux (Intel)'''
 |-
-| '''Name/Version'''
+| November 6, 2017
-| Studio 12.4
+| Scheduling halted for all nodes with more than 24 cores
-| Gnu gcc 4.4.7
+| SW ("old system")
-| Intel 12.1
 |-
-| '''Setup command'''
+| December 1, 2017
-| none (default)
+|
-| none (default)
+* User notification by email
-| use icsmpi
+* '''All users receive access to new systems'''
+| Frontenac ("new system")
 |-
-| '''MPI setup'''
+| January 3, 2017
-| none (default)
+|
-| use openmpi
+* '''Data synchronization stops'''
-| use icsmpi
+* User data that differ after this date must be transferred by users
+* Grid Engine '''scheduling disabled''' (nodes "draining")
+| SW ("old system")
 |-
-| '''Fortran / C / C++ compilers
+| January 19, 2018
-| f90 / cc / CC
+|
-| gfortran / gcc / g++
+* '''All running jobs are terminated'''
-| ifort / icc / icpc
+* Remaining hardware is transferred to new system
+| SW ("old system")
 |-
-| '''MPI compoiler wrappers'''
+| January 26, 2018
-| mpif90 / mpicc / mpiCC
+|
-| mpif90 / mpicc / mpicxx
+* User access to '''sflogin0/swlogin1 closed'''
-| mpiifort / mpiicc / mpiicpc
+* SNOlab (SX) cluster jobs terminated
-|-
+* SNOlab (SX) login nodes closed
-|'''MPI runtime environment'''
+| SW ("old system")
-| mpirun
-| mpirun
-| mpirun
 |}
-Note that '''all''' programs that were running on the Solaris platform have to be re-compiled on Linux. Binaries are not compatible as they are based on different instruction sets.
+Until year-end, we are continuously "syncing" user data from the old to the new systems. Note that these are two independent copies of the data. This synchronization stops after January 3, 2018. After this date, '''it is the responsibility of the user''' to transfer data from the old to the new system if desired. If you encounter inconsistencies and need assistance, please contact us.
-== MPI ==
+== Migration Schedule ==
-On both Solaris and Linux systems, the MPI distribution used is OpenMPI. On the Solaris platform this was integrated with the standard Studio compilers. On the Linux platform, two versions are in use:
+The migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 1-month week time period. The migration procedure has three steps:
-* A stand-alone version of OpenMPI 1.8 is used in combination with the gcc compiler and setup through the '''use openmpi''' command.
-* A second version (Intel 4.0 update 3) is used with the Intel compilers and set up together with them ("use icsmpi")
-All of these versions use the '''mpirun command''' to invoke the runtime environment. Check with '''which mpirun''' to see which version you are currently using.
-|}
-{|  style="border-spacing: 8px;"
-| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
-== New Scheduler ==
+* '''1 - Initiation of migration process'''
+** Email notification of the user (mid-November).
+** Create account on new cluster.
+** Issue temporary credentials to the new cluster and request initial login to change password.
+* '''2 - Rolling rsync of user data'''
+** Will be repeated until update requires less than 2 hrs
+*** ''/home/hpcXXXX ''
+*** ''/u1/work/hpcXXXX''
+*** ''/scratch/hpcXXXX if required''
+*** other directories ''if required''
+** Users can access both new and old systems for 1 month.
+*** Data on the old system that are newer than on the new one are rsync'ed.
+* '''3 - Final migration'''
+** Final rsync.
+** Jobs on old cluster are terminated.
+** User access to old system closed.
-== Scheduling ==
+== Migration Q&A ==
-Both the "old" M9000 servers and the "new" SW (Linux) cluster use Sun Grid Engine as a scheduler. Please consult [[HowTo:Scheduler|our Scheduler Help File]] for details about its usage. The following table gives an overview of the alterations that need to be made to a submission script if execution is to take place on the Linux production nodes, i.e. the "SW cluster".
+* '''Q''': Who migrates ?
+: '''A''': All of our users will migrate from the old SW cluster to the new "Frontenac" cluster
-{| class="wikitable" style="float:left; margin-right: 25px;"
+* '''Q''': Can I use my old "stuff" ?
-!colspan="3"| '''Changes in SGE submissions when migrating from Sparc/Solaris to x86/Linux'''
+: '''A''': Much of the old data and software will be usable on the new systems. However, the data will have to be copied over as the new systems use a separate file system, and cross access is not possible.
-|-
-|
-|'''Sparc/Solaris'''
-|'''x86/Linux'''
-|-
-| '''Queue name'''
-| m9k.q (old default, deprecated)
-| abaqus.q (new default)
-|-
-| '''Node names'''
-| m9k000*
-| sw00**, cac0**
-|-
-| '''Login node for <br> submission'''
-| sflogin0
-| swlogin1
-|-
-| '''Rel. Serial Execution Speed'''
-| 1
-| 3-6
-|-
-| '''Suggested Relative Nprocs'''
-| 1
-| 1/2
-|-
-| '''Queue specification <br> in submit script'''
-| none
-| none
-|-
-| '''Gaussian Parallel environment'''
-| <pre>#$ -pe gaussian.pe</pre>
-| <pre>#$ -pe glinux.pe</pre>
-|-
-| '''Gaussian Setup line'''
-| <pre>. /opt/gaussian/setup.sh</pre>
-| <pre>. /opt/gaussian/setup.sh</pre>
-|}
-Note that it is strongly suggested to '''lower the number of processes''' requested when submitting to the SW cluster. This is because the nodes are substantially smaller than then the M9000 servers, but provide greatly improved per-core performance. This means that even with half the core count, a speedup of 2-3 is likely.
+* '''Q''' Do I have to re-compile ?
+: '''A''': It is possible that you will have to re-compile some of the software you are using. We will assist you with this.
-We have added some entries to the table describing modifications that apply only for submissions of jobs running the Computational Chemistry software '''Gaussian'''. For more details about this software, please consult our [[HowTo:gaussian|Gaussian Help File]]. Gaussian submissions go to a dedicated large node on the SW cluster that uses local scratch space to improve performance and avoid bandwidth issues with IO.
+* '''Q''': Do I copy my files over myself ?
+: '''A''': Initially, we transfer your data for you. This synchronization process will end on December 15. If you are still altering your data after this date, it is your responsibility to transfer the data manually.
+* '''Q''': Is this optional ?
+: '''A''': No. We move both user data and hardware according to a schedule.
+* '''Q''': Can I decide when to move ?
+: '''A''': We are open to "early adopters", but we cannot grant extensions on the old systems.
+* '''Q''': Will this disrupt my research ?
+: '''A''': The moving of hardware and users causes unavoidable scheduling bottlenecks, as substantial portions of the clusters have to be kept inactive to "drain". Also, in the intermediate period when one cluster is dismantled and the other is being built up, both are substantially smaller. Especially larger jobs will be hard or impossible to schedule in the period between November'17 and February'18.
+* '''Q''': How are resources allocated on the new cluster ?
+: '''A''': Pleased read through our help file "[[Allocation|Resource Allocations on Frontenac]]"
 == Help ==
 If you have questions that you can't resolve by checking documentation, [mailto:cac.help@queensu.ca email to cac.help@queensu.ca].
-|}