Resource Allocations on the Frontenac Cluster
This Wiki entry is meant to explain how resources are shared on the CAC Frontenac cluster. This includes default allocations in terms of Compute time as well as extended resources that were allocated by Compute Canada or that come from contributed systems. We also point out differences between the current Frontenac allocation scheme and the older scheme that was used on the now decommissioned SW/CAC clusters.
Fair share vs core/job restrictions
Our job scheduler on Frontenac is [SLURM https://slurm.schedmd.com/]. All resource allocations and limitations are applied through this scheduler. For a basic intro on how to use it, please see our scheduler help file.
Every user on our systems has at least one SLURM account, the default account. Users with access to extended resources have additional accounts corresponding to these allocations. These SLURM accounts have intrinsic restrictions and allow scheduling of jobs up to these limits.
Comparison: Allocation on SW/CAC (SGE) vs Frontenac (SLURM)
Allocation Feature
|
SW/CAC (SGE)
|
Frontenac (SLURM)
|
Default allocation (compute)
|
- 48 core limit
- excludes dedicated systems
- job limits may apply
|
- 50 core years over 1 year
- standard partition
- low priority (1)
- no scheduling to "large nodes"
- job limits may apply
|
RAC allocation (compute)
|
- fixed core limit
- according to allocation from Compute Canada
- associated with dedicated node
- scheduled to specific node by user
|
- fixed core-years allocation over one year
- allocation from Compute Canada (less than ask!)
- scheduled to "reserved" partition at higher priority (5)
- no dedicated resources, reserved partition
- compete with other users of same priority
|
Contributed allocation (compute)
|
- fixed core limit depending on size of contributed system
- allocation from depends on system size
- associated with dedicated node
- scheduled to specific node by user
|
- fixed core-years allocation over one year
- allocation depends on system size
- scheduled to "reserved" partition at very high priority (10)
- no dedicated resources, reserved partition
- compete with other users of same priority
|
Limits
|
- core limits: default 48
- temporary extensions available
- RAC/contributed allocatgion run with extended limits
- global job limits may apply
|
- no permanent core limits
- sanity core limits may apply to support fair-share
- usage lowers priority (fair share)
- RAC/contributed allocation increase priority
- no global job limits
|