Difference between revisions of "Frontenac:Migration"
(add guide at top) |
(→IMPORTANT DEADLINES) |
||
Line 104: | Line 104: | ||
| SW ("old system") | | SW ("old system") | ||
|- | |- | ||
− | | November | + | | November 20, 2017 |
| | | | ||
* User notification by email | * User notification by email |
Revision as of 19:11, 17 November 2017
Contents
Quickstart
If your account has been migrated to Frontenac, you can gain access to the new system by following the steps below.
- Visit https://login.cac.queensu.ca/pwr to obtain a temporary password. You must use the original email you registered with.
- Logon to the new system using a SSH client (MobaXterm on Windows, Terminal on macOS/Linux):
ssh yourUsername@login.cac.queensu.ca
. The first time you login, the system will prompt you to change your temporary password, then log you out (so you can test logging in with the new password).
A set of guides on how to:
- ... log into the system
- ... setup and use software
- ... find your way around the filesystems
- ... submit jobs using SLURM
Migrating to the new Frontenac cluster
This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").
Note: We are in the final phase of the migration process. All users will gain access to the new systems by mid-November, and lose acess to the old systems in early January 2018. Scheduling of new jobs on the old system will stop in mid-December! Please make yourself familiar with the new systems.
Why migrate ?
Our systems underwent a substantial refresh last year with the retirement of the Solaris-based M9000 systems, and their replacement by new X86/Intel based hardware. This hardware was largely added to the existing "SW cluster" and eventually replaced it completely. However, this gradual replacement did not address issues in the base structure of that cluster, such as an old scheduler system, and a less than cutting-edge file system. To enable our users to make efficient use of the new hardware, we decided that it is time for a re-design of our main compute cluster. Some of our storage components reach their "end of life" phase and will be retired within a year.
Rather than permanently operating two separate clusters, we will move both our users and the compute hardware from one cluster/network to the other. In the interest of consistency, we can not make this process optional. We must move all our users to the new cluster by early 2018 when service contracts for the old components run out.
What's Different ?
The new cluster is based on a newer version of the same CentOS operating system. We have replaced the scheduler with SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced the "use" system by the more powerful and standard "lmod". Here are the main changes in table format.
new SW (Linux) cluster | new CAC (Frontenac) cluster | |
Operating system | CentOS 6 | CentOS 7 |
File system type | ZFS | GPFS |
Scheduler | Sun Grid Engine (SGE) | SLURM |
Software manager | usepackage | lmod |
Backup management | samfs | Hierarchical Storage Management (HSM) |
Migration Time Table
Different users will migrate at different times. We have been moving data to the new file system for months, so that at the time when "it's your turn" your data will already be available on the new system. Here is a month-by-month outline of who will move when. If you want to migrate ahead of schedule, or you have compelling reasons to delay the move, please get in touch with us at cac.help@queensu.ca
Month (2017) | Who moves ? |
September |
|
October |
|
November |
|
December |
|
We will transfer hardware from the "old" cluster (SW) to the new one (Frontenac) to accommodate the migrated users. This means that in the transition period, the old cluster will gradually become smaller while the new one grows. Dedicated hardware will be moved when its users migrate.
IMPORTANT DEADLINES
In the final phase of the migration process, all users receive a notification email and are asked to make themselves familiar wit the new systems. Here is a list of important dates that our users should keep in mind when planning to use our systems in the time period between November 2017 and February 2018.
Date | Migration Event | System |
November 6, 2017 | Scheduling halted for all nodes with more than 24 cores | SW ("old system") |
November 20, 2017 |
|
Frontenac ("new system") |
December 15, 2017 |
|
SW ("old system") |
January 3, 2018 |
|
SW ("old system") |
January 8, ,2018 |
|
SW ("old system") |
Before December 15, we are continuously "syncing" user data from the old to the new systems. Note that these are two independent copies of the data. This synchronization stops after December 15. After this date, it is the responsibility of the user to transfer data from the old to the new system if desired. If you encounter inconsistencies and need assistance, please contact us.
Migration Schedule
The migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 1-month week time period. The migration procedure has three steps:
- 1 - Initiation of migration process
- Email notification of the user (mid-November).
- Create account on new cluster.
- Issue temporary credentials to the new cluster and request initial login to change password.
- 2 - Rolling rsync of user data
- Will be repeated until update requires less than 2 hrs
- /home/hpcXXXX
- /u1/work/hpcXXXX
- /scratch/hpcXXXX if required
- other directories if required
- Users can access both new and old systems for 1 month.
- Data on the old system that are newer than on the new one are rsync'ed.
- Will be repeated until update requires less than 2 hrs
- 3 - Final migration
- Final rsync.
- Jobs on old cluster are terminated.
- User access to old system closed.
Migration Q&A
- Q: Who migrates ?
- A: All of our users will migrate from the old SW cluster to the new "Frontenac" cluster
- Q: Can I use my old "stuff" ?
- A: Much of the old data and software will be usable on the new systems. However, the data will have to be copied over as the new systems use a separate file system, and cross access is not possible.
- Q Do I have to re-compile ?
- A: It is possible that you will have to re-compile some of the software you are using. We will assist you with this.
- Q: Do I copy my files over myself ?
- A: Initially, we transfer your data for you. This synchronization process will end on December 15. If you are still altering your data after this date, it is your responsibility to transfer the data manually.
- Q: Is this optional ?
- A: No. We move both user data and hardware according to a schedule.
- Q: Can I decide when to move ?
- A: We are open to "early adopters", but we cannot grant extensions on the old systems.
- Q: Will this disrupt my research ?
- A: The moving of hardware and users causes unavoidable scheduling bottlenecks, as substantial portions of the clusters have to be kept inactive to "drain". Also, in the intermediate period when one cluster is dismantled and the other is being built up, both are substantially smaller. Especially larger jobs will be hard or impossible to schedule in the period between November'17 and February'18.
Help
If you have questions that you can't resolve by checking documentation, email to cac.help@queensu.ca.