Difference between revisions of "Frontenac:Migration"
(→Migration Q&A) |
(→Migrating to the new Frontenac (CAC) cluster) |
||
Line 4: | Line 4: | ||
This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster"). | This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster"). | ||
− | |||
− | |||
− | |||
== Migration Q&A == | == Migration Q&A == | ||
Line 30: | Line 27: | ||
* '''Q''': Will this disrupt my research ? | * '''Q''': Will this disrupt my research ? | ||
: '''A''': We will do our level best to keep disruptions to a minimum. We will give you a chance to "practise" on the new systems while you still have access to the old ones. Once you are on the new systems, access to the old ones will be cut to preserve data integrity. | : '''A''': We will do our level best to keep disruptions to a minimum. We will give you a chance to "practise" on the new systems while you still have access to the old ones. Once you are on the new systems, access to the old ones will be cut to preserve data integrity. | ||
− | |||
− | |||
− | |||
== Why migrate ? == | == Why migrate ? == | ||
Line 41: | Line 35: | ||
Rather than permanently operating two separate clusters, we will gradually move both our users and the compute hardware from one cluster/network to the other. We will do so over the course of months to give individual users plenty of time to familiarize themselves with the new environment and "wrap up" their work on the old one, thus minimizing the impact on their research. However, in the interest of consistency, we can not make this process optional. '''We must move all our users to the new cluster by early 2018''' when service contracts for the old components run out. | Rather than permanently operating two separate clusters, we will gradually move both our users and the compute hardware from one cluster/network to the other. We will do so over the course of months to give individual users plenty of time to familiarize themselves with the new environment and "wrap up" their work on the old one, thus minimizing the impact on their research. However, in the interest of consistency, we can not make this process optional. '''We must move all our users to the new cluster by early 2018''' when service contracts for the old components run out. | ||
− | |||
− | |||
− | |||
− | |||
== What's Different ? == | == What's Different ? == | ||
− | The new cluster is based on a newer version of the same operating system | + | The new cluster is based on a newer version of the same CentOS operating system. We have replaced the scheduler with SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced the "use" system by the more powerful and standard "lmod". Here are the main changes in table format. |
− | {| class="wikitable | + | {| class="wikitable" | '''Difference between "old" SW (Linux) and "new" CAC (Frontenac) clusters''' |
− | + | ||
|- | |- | ||
| | | | ||
Line 77: | Line 66: | ||
| [https://en.wikipedia.org/wiki/Hierarchical_storage_management Hierarchical Storage Management (HSM)] | | [https://en.wikipedia.org/wiki/Hierarchical_storage_management Hierarchical Storage Management (HSM)] | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
== Migration Schedule == | == Migration Schedule == | ||
Line 108: | Line 93: | ||
** Home directory on new cluster becomes ''/global/home/hpcXXXX'' | ** Home directory on new cluster becomes ''/global/home/hpcXXXX'' | ||
** Feedback requested from user. | ** Feedback requested from user. | ||
− | |||
− | |||
− | |||
== New cluster HowTo ... == | == New cluster HowTo ... == | ||
Line 119: | Line 101: | ||
* [[Filesystems:Frontenac|... find your way around the filesystems]] | * [[Filesystems:Frontenac|... find your way around the filesystems]] | ||
* [[SLURM|... submit jobs using SLURM]] | * [[SLURM|... submit jobs using SLURM]] | ||
− | |||
− | |||
− | |||
== Help == | == Help == | ||
If you have questions that you can't resolve by checking documentation, [mailto:cac.help@queensu.ca email to cac.help@queensu.ca]. | If you have questions that you can't resolve by checking documentation, [mailto:cac.help@queensu.ca email to cac.help@queensu.ca]. | ||
− |
Revision as of 19:47, 1 June 2017
Contents
Migrating to the new Frontenac (CAC) cluster
!!! This guide is seriously under construction. Please do not rely on anything you read here until this warning is removed !!!
This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").
Migration Q&A
- Q: Who migrates ?
- A: Eventually, all of our users will migrate from the old SW cluster to the new "Frontenac" cluster
- Q: Can I use my old "stuff" ?
- A: Much of the old data and software will be usable on the new systems. However, the data will have to be copied over as the new systems use a separate file system, and cross access is not possible.
- Q Do I have to re-compile ?
- A: It is possible that you will have to re-compile some of the software you are using. We will assist you with this.
- Q: Do I copy my files over myself ?
- A: No. We will do this for you in the course of the coming months.
- Q: Is this optional ?
- A: Unfortunately not. We will move both user data and hardware according to a schedule.
- Q: Can I decide when to move ?
- A: To a degree. We are open to "early adopters". Once they have been moved we move the bulk of users according to our schedule.
- Q: Will this disrupt my research ?
- A: We will do our level best to keep disruptions to a minimum. We will give you a chance to "practise" on the new systems while you still have access to the old ones. Once you are on the new systems, access to the old ones will be cut to preserve data integrity.
Why migrate ?
Our systems underwent a substantial refresh last year with the retirement of the Solaris-based M9000 systems, and their replacement by new X86/Intel based hardware. This hardware was largely added to the existing "SW cluster" and eventually replaced it completely. However, this gradual replacement did not address issues in the base structure of that cluster, such as an old scheduler system, and a less than cutting-edge file system. To enable our users to make efficient use of the new hardware, we decided that it is time for a re-design of our main compute cluster. Some of our storage components reach their "end of life" phase and will be retired within a year.
Rather than permanently operating two separate clusters, we will gradually move both our users and the compute hardware from one cluster/network to the other. We will do so over the course of months to give individual users plenty of time to familiarize themselves with the new environment and "wrap up" their work on the old one, thus minimizing the impact on their research. However, in the interest of consistency, we can not make this process optional. We must move all our users to the new cluster by early 2018 when service contracts for the old components run out.
What's Different ?
The new cluster is based on a newer version of the same CentOS operating system. We have replaced the scheduler with SLURM, which is the same as is used on the new Compute Canada "GP" systems. We also replaced the "use" system by the more powerful and standard "lmod". Here are the main changes in table format.
new SW (Linux) cluster | new CAC (Frontenac) cluster | |
Operating system | CentOS 6 | CentOS 7 |
File system type | NFS | GPFS |
Scheduler | Sun Grid Engine (SGE) | SLURM |
Software manager | usepackage | lmod |
Backup management | ??? | Hierarchical Storage Management (HSM) |
Migration Schedule
The migration proceeds according to a scheme that was devised to minimize the impact on operations and user's research activities. Research groups migrate as a whole during a 3-4 week time period. The migration procedure has four steps:
- 1 - Initiation of migration process
- Contact PI to determine migration requirements and schedule a time.
- Contact all researchers and issue temporary credentials to the new cluster.
- Create temporary account on new cluster with restricted access (sandbox).
- 2 - Rolling rsync of user data
- Will be repeated until update requires less than 2 hrs
- /home/hpcXXXX
- /u1/work/hpcXXXX
- /scratch/hpcXXXX if required
- other directories if required
- Users access the new systems through a "sandbox" home directory /global/migration/hpcXXXX
- This area is temporary and will be deleted at the end of the migration.
- This takes up to 4 weeks. The old cluster is still fully accessible.
- Associated hardware will also be moved / updated during this time period.
- Will be repeated until update requires less than 2 hrs
- 3 - Final group migration
- User lockout, required for data integrity.
- Final rsync on quiet data.
- After this access to old cluster is blocked.
- Jobs on old cluster are terminated.
- 4 - Full access to new cluster.
- Home directory on new cluster becomes /global/home/hpcXXXX
- Feedback requested from user.
New cluster HowTo ...
- ... log into the system
- ... setup and use software
- ... find your way around the filesystems
- ... submit jobs using SLURM
Help
If you have questions that you can't resolve by checking documentation, email to cac.help@queensu.ca.