Difference between revisions of "Frontenac:MigrateOff"

From CAC Wiki
Jump to: navigation, search
(Outline)
(Outline)
 
(38 intermediate revisions by the same user not shown)
Line 1: Line 1:
= '''Outline''' =
+
== '''Outline''' ==
  
Frontenac serves as our main compute cluster and is operated through the SLURM scheduler. Until March 31, allocations from the 2018 Resource Allocation Competition of Compute Canada, are running on this cluster. Furthermore, the cluster is accessed by researchers with a "contributed" priority allocation, and  on an "opportunistic" scheduling basis with low priority.
+
Frontenac is CAC's main compute cluster. Until March 31, allocations from the 2018 Resource Allocation Competition of Compute Canada ("RAC 2018"), ran on this cluster.  
  
Since the cluster will not be among the allocatable system for the 2019 Compute Canada allocation round ("RAC2019"), the operation of Frontenac will be on a cost-recovery basis from April 1, 2019 onward. For details about the fee structure, please see [[Frontenac:fees|our Frontenac Fee Wiki page]] for details. This is an important change that affects both compute access and the usage of storage.
+
This cluster is '''not''' be among the allocatable system for the 2019 Compute Canada allocation round ("RAC 2019"). '''Therefore, operation of Frontenac is on a cost-recovery basis since April 1.''' For details about the fee structure, please see [[Frontenac:Fees|our Frontenac Fee Wiki page]]. This is an important change that affects both compute access and the usage of storage.
  
'''From April 2019 we will cannot provide compute services and/or storage capacity on Frontenac free of charge.'''
+
If you are looking for systems that offer cycles free of charge, please consult the available resources at Compute Canada. Allocations of larger resources require an application, but "opportunistic" use (Rapid Access Service or RAS) is available also.
  
If you are currently using Frontenac for computations, and are looking for another system that offers cycles free of charge, please consult the available resources at Compute Canada. Allocations of larger resources require an application, but "opportunistic" use (Rapid Access Service or RAS) is available also.
+
Compute Ontario has kindly offered to provide temporary storage (located at the [Graham] Compute Cluster) for those of our current users who need to move their data and presently do not have sufficient allocation to accommodate them.
  
= Migrating to the new Frontenac cluster =
+
Note that the CAC team will continue to provide support for researchers from Queen's and other Ontario Universities, irrespective of the systems they are working on. Our main  commitment is to the researchers, not the platform.
  
This is a basic guide for users of our current CentOS 6 production systems ("SW cluster") to explain and facilitate migration to our new CentOS 7 systems ("Frontenac", "CAC cluster").
+
=== '''Important Deadlines''' ===
  
'''Note: We are in the final phase of the migration process. All users will gain access to the new systems by mid-November, and lose acess to the old systems in early January 2018. Scheduling of new jobs on the old system will stop in mid-December! Please make yourself familiar with the new systems.'''
+
{| class="wikitable" | '''Important Migration Dates'''
 
+
== Why migrate ? ==
+
 
+
 
+
== What's Different ? ==
+
 
+
 
+
{| class="wikitable" | '''Difference between "old" SW (Linux) and "new" CAC (Frontenac) clusters'''
+
 
|-
 
|-
|
+
|'''Date'''  
|'''new SW (Linux) cluster'''  
+
|'''Migration Event'''
|'''new CAC (Frontenac) cluster'''
+
|-
+
| '''Operating system'''
+
| CentOS 6
+
| [https://wiki.centos.org/ CentOS] 7
+
|-
+
| '''File system type'''
+
| ZFS
+
| [https://www.ibm.com/support/knowledgecenter/en/SSFKCN/gpfs_welcome.html GPFS]
+
|-
+
| '''Scheduler'''
+
| Sun Grid Engine (SGE)
+
| [https://slurm.schedmd.com/ SLURM]
+
|-
+
| '''Software manager'''
+
| usepackage
+
| [https://lmod.readthedocs.io/en/latest/ lmod]
+
|-
+
| '''Backup management'''
+
| samfs
+
| [https://en.wikipedia.org/wiki/Hierarchical_storage_management Hierarchical Storage Management (HSM)]
+
|}
+
 
+
== Migration Time Table ==
+
 
+
{| class="wikitable" | '''Difference between "old" SW (Linux) and "new" CAC (Frontenac) clusters'''
+
|-
+
|'''Month (2017)'''
+
|'''Who moves ?'''
+
 
|-
 
|-
| September
+
| '''April 1, 2019'''
 
|  
 
|  
* De-actived users
+
* Access to Compute Cluster from free accounts ends
* User who have not run a scheduled job for > 6 months
+
* Limited access for Queen's / Ontario users for data retrieval
* Volunteers
+
* Jobs of free accounts terminated
 +
* Jobs of free accounts from consortium (Queen's, RMC, UofO, Carleton) allowed to finish
 +
* Data for closed hpcXXXX accounts are purged for non-Ontario users
 
|-
 
|-
| October
+
| April 4, 2019
|  
+
|
* New accounts (i.e. new users will be going straight to Frontenac)
+
* End of access for RAC 2018 allocated users
* User who have not run a scheduled job for > 3 months
+
* Volunteers
+
 
|-
 
|-
| November
+
| '''September 1, 2019'''
 
|  
 
|  
* New accounts (i.e. new users will be going straight to Frontenac)
+
* '''Data of non-paying Ontario users purged'''
|-
+
* '''Accounts of non-paying users are de-activated'''
| December
+
|
+
* New accounts (i.e. new users will be going straight to Frontenac)
+
* '''Everyone'''
+
 
|}
 
|}
  
We will transfer hardware from the "old" cluster (SW) to the new one (Frontenac) to accommodate the migrated users. This means that in the transition period, the old cluster will gradually become smaller while the new one grows. Dedicated hardware will be moved when its users migrate.
+
=== Migrating off the Frontenac cluster ===
  
== '''IMPORTANT DEADLINES''' ==
+
'''Note: Any data on the Frontenac file system must be moved off the system by April 2019.''' Retention of data beyond that time requires an arrangement involving fees. Please consult [[Frontenac:Fees|our fees guide for details]]. The Arrangement has to be made before the deadline of April 1, 2019 to avoid data purges.
  
{| class="wikitable" | '''Important Migration Dates'''
+
For technical information of how to upload/download data from our system, see [[UploadingFiles:Frontenac|our file transfer help page]]
 +
 
 +
Note that the responsibility of data migration lies with the user. We do not control access to the target systems and cannot do the data transfer for you. We will assist you with technical issues but you have to arrange for the disk space on the target system.
 +
 
 +
=== Why ? ===
 +
 
 +
Unfortunately, operating expenses will no longer be covered by CFI MSI-2 resources. The only way we can continue to supply resources is by charging a fee to cover maintenance and operating costs. We do so on a cost recovery basis, i.e. we charge as much as we must, and as little as we can.
 +
 
 +
=== How ? ===
 +
 
 +
Data can be transferred to another system using scp/sftp from a command line or through a secure transfer client. Another option is to establish a Globus individual access point on our system. Please consult our [[UploadingFiles:Frontenac|guide about data transfer]]. Note that the transfer should be done using our login nodes ('''login.cac.queensu.ca''') or a data transfer node we installed for this purpose ('''transfer.cac.queensu.ca'''). If you encounter issues with moving data, please contact us through our [mailto:cac.help@queensu.ca ticketing system].
 +
 
 +
=== Migration Time Table ===
 +
 
 +
{| class="wikitable" | '''Migration timeline for non-paying users'''
 
|-
 
|-
|'''Date'''
+
| December 2018 - March 2019
|'''Migration Event'''
+
|  
|'''System'''
+
* Monthly reminders about the need to migrate data
 +
* RAC access, general "opportunistic" access continues
 +
* Both access types are free
 +
* Storage is free
 +
* Users are asked to make arrangements for charged access
 
|-
 
|-
| November 6, 2017
+
| April 1, 2019
| Scheduling halted for all nodes with more than 24 cores
+
|  
| SW ("old system")
+
* '''Access changes from free to charged'''
 +
* Users covered by agreement continue to access systems as before
 +
* All jobs of users without agreement are terminated
 +
* Non-paying users from outside Ontario:
 +
** All access stops.
 +
** Jobs terminated.
 +
** Data are subject to purge.
 +
** This includes users with a 2018 RAC allocation.
 +
* Ontario non-paying users (including Queen's, CAC consortium, SNO, OBI, etc):
 +
** Access for data management still possible.
 +
** Data are migrated to tape (nearline).
 +
** No job submission.
 +
** No access to compute cluster.
 
|-
 
|-
| December 1, 2017
+
| September 1, 2019
 
|  
 
|  
* User notification by email
+
* '''All data not covered by charged accounts are subject to purge.'''
* '''All users receive access to new systems'''
+
* '''This includes migrated data and backups (tape).'''
| Frontenac ("new system")
+
 
|-
 
|-
| January 3, 2017
 
|
 
* '''Data synchronization stops'''
 
* User data that differ after this date must be transferred by users
 
* Grid Engine '''scheduling disabled''' (nodes "draining")
 
| SW ("old system")
 
|-
 
| January 19, 2018
 
|
 
* '''All running jobs are terminated'''
 
* Remaining hardware is transferred to new system
 
| SW ("old system")
 
|-
 
| January 26, 2018
 
|
 
* User access to '''sflogin0/swlogin1 closed'''
 
* SNOlab (SX) cluster jobs terminated
 
* SNOlab (SX) login nodes closed
 
| SW ("old system")
 
 
|}
 
|}
  
== Migration Schedule ==
+
=== Migration Q&A ===
  
* '''1 - Initiation of migration process'''
+
* '''Q''': Will user support continue ?
** Email notification of the user (mid-November).
+
: '''A''': Yes. Our team will continue to support researchers from Queen's and other Ontario Universities irrespective of the systems they are working on. Our main commitment is to the researcher and their science, not to the platform.
** Create account on new cluster.
+
** Issue temporary credentials to the new cluster and request initial login to change password.
+
* '''2 - Rolling rsync of user data'''
+
** Will be repeated until update requires less than 2 hrs
+
*** ''/home/hpcXXXX ''
+
*** ''/u1/work/hpcXXXX''
+
*** ''/scratch/hpcXXXX if required''
+
*** other directories ''if required''
+
** Users can access both new and old systems for 1 month.
+
*** Data on the old system that are newer than on the new one are rsync'ed.
+
* '''3 - Final migration'''
+
** Final rsync.
+
** Jobs on old cluster are terminated.
+
** User access to old system closed.
+
 
+
== Migration Q&A ==
+
  
 
* '''Q''': Who migrates ?
 
* '''Q''': Who migrates ?
: '''A''': All of our users will migrate from the old SW cluster to the new "Frontenac" cluster
+
: '''A''': All users who currently hold an hpcXXXX (Compute Canada) free account, and will not continue on a charged account.
  
* '''Q''': Can I use my old "stuff" ?
+
* '''Q''': How much would it cost to continue the system ?
: '''A''': Much of the old data and software will be usable on the new systems. However, the data will have to be copied over as the new systems use a separate file system, and cross access is not possible.
+
: '''A''': [[Frontenac:Fees|A fee schedule can be found here.]]
  
* '''Q''' Do I have to re-compile ?
+
* '''Q''': Where can I continue to use resources for free ?
: '''A''': It is possible that you will have to re-compile some of the software you are using. We will assist you with this.
+
: '''A''': [https://www.computecanada.ca/research-portal/accessing-resources/available-resources/ Compute Canada systems] offer free resources based on allocations from a competition (RAC), or to a limited extend on "opportunistic" usage (RAC).
 +
 
 +
* '''Q''': Can I run my  programs somewhere else?
 +
: '''A''': Most execution scripts, pipelines, etc will still work on another system, as Frontenac is very similar to other Compute Canada systems.
  
 
* '''Q''': Do I copy my files over myself ?
 
* '''Q''': Do I copy my files over myself ?
: '''A''': Initially, we transfer your data for you. This synchronization process will end on December 15. If you are still altering your data after this date, it is your responsibility to transfer the data manually.
+
: '''A''': Yes. We cannot do the data transfer for you. We will assist you with technical issues, but we cannot arrange for permanent space on other systems.
 +
 
 +
* '''Q''': I have a lot of data and don't know where to put it.
 +
: '''A''': We have made arrangements with SHARCNET to store data temporarily on Graham. You can use the default space allocations there to store your data, and ask for extended storage if necessary. Keep in mind that this is meant as a temporary solution before you move the data to the system where you continue your work.
  
 
* '''Q''': Is this optional ?
 
* '''Q''': Is this optional ?
: '''A''': No. We move both user data and hardware according to a schedule.
+
: '''A''': No. The move must be done by April 1, 2019, as we have to cover our operating and maintenance costs.
  
 
* '''Q''': Can I decide when to move ?
 
* '''Q''': Can I decide when to move ?
: '''A''': We are open to "early adopters", but we cannot grant extensions on the old systems.
+
: '''A''': Yes, as long as the move is completed by April 1.
  
 
* '''Q''': Will this disrupt my research ?
 
* '''Q''': Will this disrupt my research ?
: '''A''': The moving of hardware and users causes unavoidable scheduling bottlenecks, as substantial portions of the clusters have to be kept inactive to "drain". Also, in the intermediate period when one cluster is dismantled and the other is being built up, both are substantially smaller. Especially larger jobs will be hard or impossible to schedule in the period between November'17 and February'18.
+
: '''A''': We hope that you continue your research on our systems and pay the [[Frontenac:Fees|associated fees]]. In that case things will be continuing without any disruption. If you decide to move to another system that offers free access, you should plan the data migration as early as possible to smooth over any issues that the migration entails. We will do our best to assist you with this, but we cannot do it for you.
 +
 
 +
* '''Q''': I have a RAC 2018. Does this apply to me ?
 +
: '''A''': Yes. The bulk of data has to be moved by April 2019. This is particularly important if you are a researcher outside Ontario. We may be able to help make arrangements prior to the onset of the 2019 allocation on a case-by-case basis. [mailto:help@cac.queensu.ca Contact us].
  
* '''Q''': How are resources allocated on the new cluster ?
+
* '''Q''': I have contributed hardware to CAC. Does this apply to me ?
: '''A''': Pleased read through our help file "[[Allocation|Resource Allocations on Frontenac]]"
+
: '''A''': Contributed accounts were issued on the basis of a prior agreement and will not be subject to this as long as the agreement continues.
  
 
== Help ==
 
== Help ==
 
If you have questions that you can't resolve by checking documentation, [mailto:cac.help@queensu.ca email to cac.help@queensu.ca].
 
If you have questions that you can't resolve by checking documentation, [mailto:cac.help@queensu.ca email to cac.help@queensu.ca].

Latest revision as of 14:35, 3 September 2019

Outline

Frontenac is CAC's main compute cluster. Until March 31, allocations from the 2018 Resource Allocation Competition of Compute Canada ("RAC 2018"), ran on this cluster.

This cluster is not be among the allocatable system for the 2019 Compute Canada allocation round ("RAC 2019"). Therefore, operation of Frontenac is on a cost-recovery basis since April 1. For details about the fee structure, please see our Frontenac Fee Wiki page. This is an important change that affects both compute access and the usage of storage.

If you are looking for systems that offer cycles free of charge, please consult the available resources at Compute Canada. Allocations of larger resources require an application, but "opportunistic" use (Rapid Access Service or RAS) is available also.

Compute Ontario has kindly offered to provide temporary storage (located at the [Graham] Compute Cluster) for those of our current users who need to move their data and presently do not have sufficient allocation to accommodate them.

Note that the CAC team will continue to provide support for researchers from Queen's and other Ontario Universities, irrespective of the systems they are working on. Our main commitment is to the researchers, not the platform.

Important Deadlines

Date Migration Event
April 1, 2019
  • Access to Compute Cluster from free accounts ends
  • Limited access for Queen's / Ontario users for data retrieval
  • Jobs of free accounts terminated
  • Jobs of free accounts from consortium (Queen's, RMC, UofO, Carleton) allowed to finish
  • Data for closed hpcXXXX accounts are purged for non-Ontario users
April 4, 2019
  • End of access for RAC 2018 allocated users
September 1, 2019
  • Data of non-paying Ontario users purged
  • Accounts of non-paying users are de-activated

Migrating off the Frontenac cluster

Note: Any data on the Frontenac file system must be moved off the system by April 2019. Retention of data beyond that time requires an arrangement involving fees. Please consult our fees guide for details. The Arrangement has to be made before the deadline of April 1, 2019 to avoid data purges.

For technical information of how to upload/download data from our system, see our file transfer help page

Note that the responsibility of data migration lies with the user. We do not control access to the target systems and cannot do the data transfer for you. We will assist you with technical issues but you have to arrange for the disk space on the target system.

Why ?

Unfortunately, operating expenses will no longer be covered by CFI MSI-2 resources. The only way we can continue to supply resources is by charging a fee to cover maintenance and operating costs. We do so on a cost recovery basis, i.e. we charge as much as we must, and as little as we can.

How ?

Data can be transferred to another system using scp/sftp from a command line or through a secure transfer client. Another option is to establish a Globus individual access point on our system. Please consult our guide about data transfer. Note that the transfer should be done using our login nodes (login.cac.queensu.ca) or a data transfer node we installed for this purpose (transfer.cac.queensu.ca). If you encounter issues with moving data, please contact us through our ticketing system.

Migration Time Table

December 2018 - March 2019
  • Monthly reminders about the need to migrate data
  • RAC access, general "opportunistic" access continues
  • Both access types are free
  • Storage is free
  • Users are asked to make arrangements for charged access
April 1, 2019
  • Access changes from free to charged
  • Users covered by agreement continue to access systems as before
  • All jobs of users without agreement are terminated
  • Non-paying users from outside Ontario:
    • All access stops.
    • Jobs terminated.
    • Data are subject to purge.
    • This includes users with a 2018 RAC allocation.
  • Ontario non-paying users (including Queen's, CAC consortium, SNO, OBI, etc):
    • Access for data management still possible.
    • Data are migrated to tape (nearline).
    • No job submission.
    • No access to compute cluster.
September 1, 2019
  • All data not covered by charged accounts are subject to purge.
  • This includes migrated data and backups (tape).

Migration Q&A

  • Q: Will user support continue ?
A: Yes. Our team will continue to support researchers from Queen's and other Ontario Universities irrespective of the systems they are working on. Our main commitment is to the researcher and their science, not to the platform.
  • Q: Who migrates ?
A: All users who currently hold an hpcXXXX (Compute Canada) free account, and will not continue on a charged account.
  • Q: How much would it cost to continue the system ?
A: A fee schedule can be found here.
  • Q: Where can I continue to use resources for free ?
A: Compute Canada systems offer free resources based on allocations from a competition (RAC), or to a limited extend on "opportunistic" usage (RAC).
  • Q: Can I run my programs somewhere else?
A: Most execution scripts, pipelines, etc will still work on another system, as Frontenac is very similar to other Compute Canada systems.
  • Q: Do I copy my files over myself ?
A: Yes. We cannot do the data transfer for you. We will assist you with technical issues, but we cannot arrange for permanent space on other systems.
  • Q: I have a lot of data and don't know where to put it.
A: We have made arrangements with SHARCNET to store data temporarily on Graham. You can use the default space allocations there to store your data, and ask for extended storage if necessary. Keep in mind that this is meant as a temporary solution before you move the data to the system where you continue your work.
  • Q: Is this optional ?
A: No. The move must be done by April 1, 2019, as we have to cover our operating and maintenance costs.
  • Q: Can I decide when to move ?
A: Yes, as long as the move is completed by April 1.
  • Q: Will this disrupt my research ?
A: We hope that you continue your research on our systems and pay the associated fees. In that case things will be continuing without any disruption. If you decide to move to another system that offers free access, you should plan the data migration as early as possible to smooth over any issues that the migration entails. We will do our best to assist you with this, but we cannot do it for you.
  • Q: I have a RAC 2018. Does this apply to me ?
A: Yes. The bulk of data has to be moved by April 2019. This is particularly important if you are a researcher outside Ontario. We may be able to help make arrangements prior to the onset of the 2019 allocation on a case-by-case basis. Contact us.
  • Q: I have contributed hardware to CAC. Does this apply to me ?
A: Contributed accounts were issued on the basis of a prior agreement and will not be subject to this as long as the agreement continues.

Help

If you have questions that you can't resolve by checking documentation, email to cac.help@queensu.ca.