Allocation

From CAC Wiki
Revision as of 16:05, 8 February 2018 by Hasch (Talk | contribs) (Created page with "== Resource Allocations on the Frontenac Cluster == This Wiki entry is meant to explain how resources are shared on the CAC Frontenac cluster. This includes default allocatio...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Resource Allocations on the Frontenac Cluster

This Wiki entry is meant to explain how resources are shared on the CAC Frontenac cluster. This includes default allocations in terms of Compute time as well as extended resources that were allocated by Compute Canada or that come from contributed systems. We also point out differences between the current Frontenac allocation scheme and the older scheme that was used on the now decommissioned SW/CAC clusters.

Fair share vs core/job restrictions

Our job scheduler on Frontenac is [SLURM https://slurm.schedmd.com/]. All resource allocations and limitations are applied through this scheduler. For a basic intro on how to use it, please see our scheduler help file.

Every user on our systems has at least one SLURM account, the default account. Users with access to extended resources have additional accounts corresponding to these allocations. These SLURM accounts have intrinsic restrictions and allow scheduling of jobs up to these limits.


Comparison: Allocation on SW/CAC (SGE) vs Frontenac (SLURM)

Allocation Feature SW/CAC (SGE) Frontenac (SLURM)
November 6, 2017 Scheduling halted for all nodes with more than 24 cores SW ("old system")
December 1, 2017
  • User notification by email
  • All users receive access to new systems
Frontenac ("new system")
January 3, 2017
  • Data synchronization stops
  • User data that differ after this date must be transferred by users
  • Grid Engine scheduling disabled (nodes "draining")
SW ("old system")
January 19, 2018
  • All running jobs are terminated
  • Remaining hardware is transferred to new system
SW ("old system")
January 26, 2018
  • User access to sflogin0/swlogin1 closed
  • SNOlab (SX) cluster jobs terminated
  • SNOlab (SX) login nodes closed
SW ("old system")