Difference between revisions of "SLURM Accounting"
(→Account usage and resource limits) |
(→Account usage and resource limits) |
||
Line 79: | Line 79: | ||
== Account usage and resource limits == | == Account usage and resource limits == | ||
− | Some usage accounts may have set resource limits. Once these limits are exhausted, the account will be inactivated and a user will need to submit jobs under another account instead (such as the "default" account). To view an account's characteristics and resource limits, the command "sacctmgr show associations where account=AccountName" can be used. Here is an example account | + | Some usage accounts may have set resource limits. Once these limits are exhausted, the account will be inactivated and a user will need to submit jobs under another account instead (such as the "default" account). To view an account's characteristics and resource limits, the command "sacctmgr show associations where account=AccountName" can be used. Here is an example account: |
<pre> | <pre> | ||
Line 89: | Line 89: | ||
</pre> | </pre> | ||
− | In this particular case, the "rac1" account has a limit of 100 CPU minutes of usage ("TRES" stands for "Trackable RESource"). Once the limit is reached, a user may no longer schedule jobs under this account (the "default" account is ''always'' available). If a job would exceed an account's maximum utilization, it will not be scheduled. The scheduler will indicate jobs that cannot be scheduled through "squeue -a": jobs will show that they cannot be scheduled due to "(AssocGrpCPUMinutesLimit)". | + | In this particular case, the "rac1" account has a limit of 100 CPU minutes of usage split between two users ("TRES" stands for "Trackable RESource"). Once the limit is reached, a user may no longer schedule jobs under this account (the "default" account is ''always'' available). If a job would exceed an account's maximum utilization, it will not be scheduled. The scheduler will indicate jobs that cannot be scheduled through "squeue -a": jobs will show that they cannot be scheduled due to "(AssocGrpCPUMinutesLimit)". |
Utilization is tracked in terms of CPU minutes. Essentially using 1 CPU for 1 minute equals a use of 1 CPU minute. Using this logic, either 32 CPUs for 1 minute or 1 CPU for 32 minutes both result in a usage of 32 CPU minutes. Account utilization is viewed with the following command: sreport cluster AccountUtilizationByUser user=UserName start=YYYY-MM-DD | Utilization is tracked in terms of CPU minutes. Essentially using 1 CPU for 1 minute equals a use of 1 CPU minute. Using this logic, either 32 CPUs for 1 minute or 1 CPU for 32 minutes both result in a usage of 32 CPU minutes. Account utilization is viewed with the following command: sreport cluster AccountUtilizationByUser user=UserName start=YYYY-MM-DD |
Revision as of 16:55, 2 August 2016
Contents
Introduction to accounts
There a number of partitions and settings on the cluster reserved for special purposes like RAC resource allocations or controlling usage of specific software packages. To use these, you will need to understand the basics of how SLURM performs access control and accounting.
For users, a SLURM account is simply an association between your user name and a particular usage account. These usage accounts may grant access to special partitions or otherwise give a user's jobs a higher priority. A user can be a member of multiple accounts, and can choose the account for a job at submission time. To view the accounts available to you, use the following command:
sacctmgr show associations where user=yourUserName
Example usage and output:
[jeffs@cac009 ~]$ sacctmgr show associations where user=jeffs Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- cac_workup rac1 jeffs 1 privileged cac_workup rac jeffs 1 privileged cac_workup snowflake jeffs 1 normal cac_workup default jeffs 1 normal
Here we can see that user "jeffs" has 4 usage accounts: rac1, rac, snowflake, and default. Accounts rac1 and rac have "privileged" queueing priority.
Hidden partitions
Upon first logging in to the cluster, you are given the ability to run jobs under a default account. This default account has no special privileges or restrictions, and is able to submit to all partitions where access has not been restricted to a specific groups. These partitions are visible with sinfo.
[jeffs@cac009 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST standard* up 2-00:00:00 5 idle cac[002-006] large up 14-00:00:0 3 idle cac[007-009]
However, the cluster also has a number of "hidden" partitions as well. These partitions have been hidden to avoid cluttering the output of commands like sinfo (most users will not want to view them). You can view these hidden partitions with sinfo -a.
[jeffs@cac009 ~]$ sinfo -a PARTITION AVAIL TIMELIMIT NODES STATE NODELIST standard* up 2-00:00:00 5 idle cac[002-006] large up 14-00:00:0 3 idle cac[007-009] rac-jobs up infinite 8 idle cac[002-009] special-snowflake up infinite 8 idle cac[002-009] debug up infinite 8 idle cac[002-009]
It looks like there are 3 hidden partitions on the test cluster: rac-jobs, special-snowflake, and debug. In this particular case, these are simply test partitions for the finalized cluster, but you can see that the 3 hidden partitions have a larger number of nodes available and no job length limit. Note that jobs submitted these partitions will queue with jobs submitted by users under the default resource allocations. No special queue priority is given from submitting to one of these partitions- queue priority is instead controlled by a user's account. If you are given access to one of these special "hidden partitions", we will inform you of which partitions you are able to submit to.
Submitting jobs with a usage account
All jobs are submitted to SLURM under a particular usage account. If an account is not specified, a user's default account will be used instead. To submit a job using a particular account, simply add the "-A <accountName>" to job scripts. To indicate a particular partition to be used, submit a job with "-p partitionName". A job will be unable to be scheduled if the account it is submitted under does not have permission to run in a partition. Note that jobs in hidden partitions will not show up in squeue's output unless the "-a" option is used.
An example job to be submitted under the rac-jobs partition:
#!/bin/bash #SBATCH -A rac #SBATCH -p rac-jobs #SBATCH -c 1 #SBATCH --mem=4000 #SBATCH -t 6:0:0 <actual job commands would go here>
Setting a default usage account
Under many circumstances, it may be desirable to set a default usage account so that the -A option does not have to be added manually to every job a user submits (if they want all jobs to run using an account other than default). To change a user's default account (you are only able to modify your own account):
sacctmgr modify user where name=UserName set DefaultAccount=AccountName
Account usage and resource limits
Some usage accounts may have set resource limits. Once these limits are exhausted, the account will be inactivated and a user will need to submit jobs under another account instead (such as the "default" account). To view an account's characteristics and resource limits, the command "sacctmgr show associations where account=AccountName" can be used. Here is an example account:
Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- cac_workup rac1 1 cpu=100 privileged cac_workup rac1 hpc3293 1 privileged cac_workup rac1 jeffs 1 privileged
In this particular case, the "rac1" account has a limit of 100 CPU minutes of usage split between two users ("TRES" stands for "Trackable RESource"). Once the limit is reached, a user may no longer schedule jobs under this account (the "default" account is always available). If a job would exceed an account's maximum utilization, it will not be scheduled. The scheduler will indicate jobs that cannot be scheduled through "squeue -a": jobs will show that they cannot be scheduled due to "(AssocGrpCPUMinutesLimit)".
Utilization is tracked in terms of CPU minutes. Essentially using 1 CPU for 1 minute equals a use of 1 CPU minute. Using this logic, either 32 CPUs for 1 minute or 1 CPU for 32 minutes both result in a usage of 32 CPU minutes. Account utilization is viewed with the following command: sreport cluster AccountUtilizationByUser user=UserName start=YYYY-MM-DD
Example output (show jeffs's usage since June 1, 2016):
[jeffs@cac009 ~]$ sreport cluster AccountUtilizationByUser start=2016-06-01 user=jeffs -------------------------------------------------------------------------------- Cluster/Account/User Utilization 2016-06-01T00:00:00 - 2016-08-01T23:59:59 (5356800 secs) Use reported in TRES Minutes -------------------------------------------------------------------------------- Cluster Account Login Proper Name Used Energy --------- --------------- --------- --------------- -------- -------- cac_work+ rac1 jeffs 16 0 cac_work+ rac jeffs 40 0 cac_work+ snowflake jeffs 3944 0 cac_work+ default jeffs 523 0