Difference between revisions of "Hardware:Frontenac"

From CAC Wiki
Jump to: navigation, search
 
(53 intermediate revisions by 5 users not shown)
Line 1: Line 1:
The Frontenac cluster is CAC's newest compute cluster. It features a new set of hardware, a new network configuration, a new scheduler, a new software module system, a new OS, and a new set of compilers and related software. This page is intended to give an overview of its capabilities and provide a migration guide for new users. Please note that user accounts and data are *not* shared between Frontenac and the SW cluster, although you may request that your data is copied over.
+
The Frontenac cluster is CAC's primary compute cluster. It features a set of hardware, a network configuration, a slurm scheduler, an lmod software module system, Centos operating system, and a set of compilers and related software. This page is intended to give an overview of its capabilities and provide a migration guide for new users.  
  
The Frontenac cluster is expected to rapidly grow as nodes are migrated from the SW cluster. Currently the cluster consists entirely of 24 core (Intel Xeon CPU E5-2650 v4 @ 2.20GHz) x 256GB RAM nodes.
+
== Hardware ==
  
= Full documentation =
+
The Centre for Advanced Computing operates a cluster of X86 based multicore machines running Linux.This page explains essential features of this cluster and is meant as a basic guide for its usage.<br clear=all>
  
'''A full migration guide can be found here: [[Migration:Frontenac|Frontenac cluster migration guide]]'''
+
{|  style="border-spacing: 8px;"
 +
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
  
 +
{| class="wikitable" style="float:left; margin-right: 25px;"
 +
!colspan="9"| '''Frontenac Cluster Nodes'''
 +
|-
 +
|'''Host'''
 +
|'''CPU model'''
 +
|'''Speed'''
 +
|'''Cores'''
 +
|'''Core(s) per socket'''
 +
|'''Sockets'''
 +
|'''Features'''
 +
|'''Memory'''
 +
|-
 +
| cac025
 +
| E7-4800 v3
 +
| 2.6 GHz
 +
| 48
 +
| 12
 +
| 4
 +
| avx2, sse3
 +
| 1 TB
 +
|-
 +
| cac026
 +
| E7-4800 v3
 +
| 2.6 GHz
 +
| 48
 +
| 12
 +
| 4
 +
| avx2, sse3
 +
| 1 TB
 +
|-
 +
| cac028
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 16
 +
| 8
 +
| avx2, sse3
 +
| 2 TB
 +
|-
 +
| cac029
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 16
 +
| 8
 +
| avx2, sse3
 +
| 2 TB
 +
|-
 +
!colspan="8"| [[File:x3950.jpg|thumb|left|600x600px|alt=Software (SW) Frontenac Cluster nodes]]
 +
|-
 +
| cac030
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 16
 +
| 8
 +
| avx2, sse3
 +
| 2 TB
 +
|-
 +
| cac031
 +
| E7-8867 v4
 +
| 2.3 GHz
 +
| 144
 +
| 18
 +
| 8
 +
| avx2, sse3
 +
| 1 TB
 +
|-
 +
| cac032
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 16
 +
| 8
 +
| avx2, sse3
 +
| 2 TB
 +
|-
 +
| cac033
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 16
 +
| 8
 +
| avx2, sse3
 +
| 2 TB
 +
|-
 +
| cac034
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac035
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac036
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac037
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac038
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac039
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac040
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac041
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac042
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac043
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac044
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac045
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac046
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac047
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac048
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac049
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac050
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac051
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac052
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac053
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac054
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac055
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac056
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac057
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac058
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac059
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac060
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac061
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac062
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac063
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac064
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac065
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac066
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac067
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac068
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3 
 +
| 256 GB
 +
|-
 +
| cac069
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac070
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac071
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac072
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac073
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac074
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac075
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac076
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac077
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac078
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac079
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac080
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac081
 +
| E5-2650 v4
 +
| 2.7 GHz
 +
| 24
 +
| 12
 +
| 2
 +
| avx2, sse3
 +
| 256 GB
 +
|-
 +
| cac100
 +
| 6226R
 +
| 3.6 GHz
 +
| 32
 +
| 16
 +
| 2
 +
| avx2, sse3
 +
| 191 GB
 +
|-
 +
| cac102
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac104
 +
| 6130
 +
| 2.1 GHz
 +
| 32
 +
| 16
 +
| 2
 +
| avx2, sse3, 3xGP100 GPU
 +
| 191 GB
 +
|-
 +
| cac105
 +
| 6130
 +
| 2.1 GHz
 +
| 32
 +
| 16
 +
| 2
 +
| avx2, sse3, 3xGP100 GPU
 +
| 191 GB
 +
|-
 +
| cac106
 +
| 6130
 +
| 2.1 GHz
 +
| 32
 +
| 16
 +
| 2
 +
| avx2, sse3, 3xGP100 GPU
 +
| 191 GB
 +
|-
 +
| cac107
 +
| 6130
 +
| 2.1 GHz
 +
| 32
 +
| 16
 +
| 2
 +
| avx2, sse3, 1xV100 GPU
 +
| 191 GB
 +
|-
 +
| cac108
 +
| 6130
 +
| 2.1 GHz
 +
| 32
 +
| 16
 +
| 2
 +
| avx2, sse3, 1xV100 GPU
 +
| 191 GB
 +
|-
 +
| cac109
 +
| 6130
 +
| 2.1 GHz
 +
| 32
 +
| 16
 +
| 2
 +
| avx2, sse3, 1xV100 GPU
 +
| 191 GB
 +
|-
 +
| cac111<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref>
 +
| EPYC 7551P
 +
| 2.0 GHz
 +
| 32<ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref>
 +
| 32
 +
| 1
 +
| avx2, sse3, 1xTitan GPU
 +
| 128 GB
 +
|-
 +
| cac112<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref>
 +
| EPYC 7551P
 +
| 2.0 GHz
 +
| 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref>
 +
| 32
 +
| 1
 +
| avx2, sse3, 1xRTX4000 GPU
 +
| 128 GB
 +
|-
 +
| cac113<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref>
 +
| EPYC 7551P
 +
| 2.0 GHz
 +
| 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref>
 +
| 32
 +
| 1
 +
| avx2, sse3, 1xRTX4000 GPU
 +
| 128 GB
 +
|-
 +
| cac114<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref>
 +
| EPYC 7551P
 +
| 2.0 GHz
 +
| 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref>
 +
| 32
 +
| 1
 +
| avx2, sse3, 2xRTX4000 GPU
 +
| 128 GB
 +
|-
 +
| cac115<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref>
 +
| EPYC 7551P
 +
| 2.0 GHz
 +
| 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref>
 +
| 32
 +
| 1
 +
| avx2, sse3, 1xRTX4000 GPU
 +
| 128 GB
 +
|-
 +
| cac116
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac117
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac118
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac119
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac120
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac121
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac122
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac123
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac124
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac1125
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac126
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac127
 +
| 6338
 +
| 2.0 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3
 +
| 512 GB
 +
|-
 +
| cac140
 +
| Epyc 7443
 +
| 2.8 GHz
 +
| 48
 +
| 24
 +
| 2
 +
| avx2, sse3, 2xA30 GPU 24GB
 +
| 512 GB
 +
|-
 +
| cac141
 +
| Epyc 7443
 +
| 2.8 GHz
 +
| 48
 +
| 24
 +
| 2
 +
| avx2, sse3, 2xA30 GPU 24GB
 +
| 512 GB
 +
|-
 +
| cac142
 +
| Epyc 7443
 +
| 2.8 GHz
 +
| 48
 +
| 24
 +
| 2
 +
| avx2, sse3, 2xA30 GPU 24GB
 +
| 512 GB
 +
|-
 +
| cac143
 +
| Epyc 7443
 +
| 2.8 GHz
 +
| 48
 +
| 24
 +
| 2
 +
| avx2, sse3, 2xA30 GPU 24GB
 +
| 512 GB
 +
|-
 +
| cac144
 +
| Epyc 7443
 +
| 2.8 GHz
 +
| 48
 +
| 24
 +
| 2
 +
| avx2, sse3, 2xA30 GPU 24GB
 +
| 512 GB
 +
|-
 +
| cac145
 +
| Epyc 7443
 +
| 2.8 GHz
 +
| 48
 +
| 24
 +
| 2
 +
| avx2, sse3, 2xA30 GPU 24GB
 +
| 512 GB
 +
|-
 +
| cac200
 +
| 8362
 +
| 2.8 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3, 2xA100 GPU
 +
| 512 GB
 +
|-
 +
| cac201
 +
| 6430
 +
| 2.1 GHz
 +
| 64
 +
| 32
 +
| 2
 +
| avx2, sse3, 2xL4 GPU
 +
| 256 GB
 +
|-
 +
|}
 +
|}
 +
 +
== Documentation ==
 +
 +
* [[Frontenac:Off|Migrating off the Frontenac System]]
 
* [[Access:Frontenac|Logging on to the system]]
 
* [[Access:Frontenac|Logging on to the system]]
 
* [[Software:Frontenac|List of installed software and how to use it]]
 
* [[Software:Frontenac|List of installed software and how to use it]]
Line 13: Line 835:
 
* [[SLURM_Accounting|SLURM accounting and special job submission]]
 
* [[SLURM_Accounting|SLURM accounting and special job submission]]
  
= Quickstart =
+
== Quickstart ==
  
 
For those who want to just log on and get started with the new system, the bare essentials are shown below.
 
For those who want to just log on and get started with the new system, the bare essentials are shown below.
  
== Logging on ==
+
=== Logging on ===
  
 
Login to the Frontenac cluster is via SSH access only. You will need an SSH client like Terminal on Linux/macOS or [http://mobaxterm.mobatek.net/ MobaXterm] on Windows. To log on to the cluster, execute the following command in your SSH client of choice:
 
Login to the Frontenac cluster is via SSH access only. You will need an SSH client like Terminal on Linux/macOS or [http://mobaxterm.mobatek.net/ MobaXterm] on Windows. To log on to the cluster, execute the following command in your SSH client of choice:
Line 25: Line 847:
 
The first time you log on, you will be prompted to accept this server's RSA key (<code>d0:9f:e9:e2:b0:fe:6b:56:bb:74:46:c5:fb:89:a4:41</code>). Type "yes" to proceed, then enter your password normally. No characters appear while typing your password.
 
The first time you log on, you will be prompted to accept this server's RSA key (<code>d0:9f:e9:e2:b0:fe:6b:56:bb:74:46:c5:fb:89:a4:41</code>). Type "yes" to proceed, then enter your password normally. No characters appear while typing your password.
  
== Filesystems ==
+
=== Filesystems ===
  
The Frontenac cluster uses a shared GPFS filesystem for all file storage. User files are located under <code>/global/home</code>, shared project space under <code>/global/project</code>, and network scratch space under <code>/global/scratch</code>. In to network storage, each compute node has a 1.5TB local hard disk for fast access to local scratch space by jobs using the location specified by the <code>$TMPDISK</code> environment variable.  
+
The Frontenac cluster uses a shared GPFS filesystem for all file storage. User files are located under <code>/global/home</code>, shared project space under <code>/global/project</code>, and network scratch space under <code>/global/scratch</code>. In to network storage, each compute node has a 1.5TB local hard disk for fast access to local scratch space by jobs using the location specified by the <code>$TMPDISK</code> environment variable.
  
== Submitting jobs ==
+
=== Submitting jobs ===
  
 
Frontenac uses the SLURM scheduler instead of Sun Grid Engine. The <code>sbatch</code> command is used to submit jobs, <code>squeue</code> can be used to check the status of jobs, and <code>scancel</code> can be used to kill a job. For users looking to get started with SLURM as fast as possible, a minimalist template job script is shown below:
 
Frontenac uses the SLURM scheduler instead of Sun Grid Engine. The <code>sbatch</code> command is used to submit jobs, <code>squeue</code> can be used to check the status of jobs, and <code>scancel</code> can be used to kill a job. For users looking to get started with SLURM as fast as possible, a minimalist template job script is shown below:
Line 45: Line 867:
 
</pre>
 
</pre>
  
Assuming our job is called <code>test-job.sh</code>, we can submit it with <code>sbatch test-job.sh</code>. Detailed documentation can be found on our [[SLURM | SLURM documentation page]]. One final thing to note is that it is possible to submit an interactive job with <code>srun --x11 --pty bash</code>. This starts a personal bash shell on a node with resources available.  
+
Assuming our job is called <code>test-job.sh</code>, we can submit it with <code>sbatch test-job.sh</code>. Detailed documentation can be found on our [[SLURM | SLURM documentation page]]. One final thing to note is that it is possible to submit an interactive job with <code>srun --x11 --pty bash</code>. This starts a personal bash shell on a node with resources available.
 +
 
 +
=== Accounts, Allocations, Partitions ===
  
== Migration guide ==
+
Please check out our helpfile about [[Allocation|allocations on the Frontenac Cluster]]
  
Please see our [[Migration:Frontenac|Frontenac cluster migration guide]] for a full overview of the migration process.
+
----

Latest revision as of 13:51, 19 December 2023

The Frontenac cluster is CAC's primary compute cluster. It features a set of hardware, a network configuration, a slurm scheduler, an lmod software module system, Centos operating system, and a set of compilers and related software. This page is intended to give an overview of its capabilities and provide a migration guide for new users.

Hardware

The Centre for Advanced Computing operates a cluster of X86 based multicore machines running Linux.This page explains essential features of this cluster and is meant as a basic guide for its usage.

Frontenac Cluster Nodes
Host CPU model Speed Cores Core(s) per socket Sockets Features Memory
cac025 E7-4800 v3 2.6 GHz 48 12 4 avx2, sse3 1 TB
cac026 E7-4800 v3 2.6 GHz 48 12 4 avx2, sse3 1 TB
cac028 E7-8867 v3 2.5 GHz 128 16 8 avx2, sse3 2 TB
cac029 E7-8867 v3 2.5 GHz 128 16 8 avx2, sse3 2 TB
Software (SW) Frontenac Cluster nodes
cac030 E7-8867 v3 2.5 GHz 128 16 8 avx2, sse3 2 TB
cac031 E7-8867 v4 2.3 GHz 144 18 8 avx2, sse3 1 TB
cac032 E7-8867 v3 2.5 GHz 128 16 8 avx2, sse3 2 TB
cac033 E7-8867 v3 2.5 GHz 128 16 8 avx2, sse3 2 TB
cac034 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac035 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac036 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac037 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac038 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac039 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac040 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac041 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac042 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac043 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac044 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac045 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac046 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac047 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac048 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac049 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac050 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac051 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac052 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac053 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac054 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac055 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac056 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac057 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac058 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac059 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac060 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac061 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac062 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac063 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac064 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac065 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac066 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac067 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac068 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac069 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac070 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac071 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac072 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac073 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac074 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac075 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac076 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac077 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac078 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac079 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac080 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac081 E5-2650 v4 2.7 GHz 24 12 2 avx2, sse3 256 GB
cac100 6226R 3.6 GHz 32 16 2 avx2, sse3 191 GB
cac102 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac104 6130 2.1 GHz 32 16 2 avx2, sse3, 3xGP100 GPU 191 GB
cac105 6130 2.1 GHz 32 16 2 avx2, sse3, 3xGP100 GPU 191 GB
cac106 6130 2.1 GHz 32 16 2 avx2, sse3, 3xGP100 GPU 191 GB
cac107 6130 2.1 GHz 32 16 2 avx2, sse3, 1xV100 GPU 191 GB
cac108 6130 2.1 GHz 32 16 2 avx2, sse3, 1xV100 GPU 191 GB
cac109 6130 2.1 GHz 32 16 2 avx2, sse3, 1xV100 GPU 191 GB
cac111[1] EPYC 7551P 2.0 GHz 32[2] 32 1 avx2, sse3, 1xTitan GPU 128 GB
cac112[1] EPYC 7551P 2.0 GHz 32 [2] 32 1 avx2, sse3, 1xRTX4000 GPU 128 GB
cac113[1] EPYC 7551P 2.0 GHz 32 [2] 32 1 avx2, sse3, 1xRTX4000 GPU 128 GB
cac114[1] EPYC 7551P 2.0 GHz 32 [2] 32 1 avx2, sse3, 2xRTX4000 GPU 128 GB
cac115[1] EPYC 7551P 2.0 GHz 32 [2] 32 1 avx2, sse3, 1xRTX4000 GPU 128 GB
cac116 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac117 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac118 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac119 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac120 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac121 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac122 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac123 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac124 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac1125 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac126 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac127 6338 2.0 GHz 64 32 2 avx2, sse3 512 GB
cac140 Epyc 7443 2.8 GHz 48 24 2 avx2, sse3, 2xA30 GPU 24GB 512 GB
cac141 Epyc 7443 2.8 GHz 48 24 2 avx2, sse3, 2xA30 GPU 24GB 512 GB
cac142 Epyc 7443 2.8 GHz 48 24 2 avx2, sse3, 2xA30 GPU 24GB 512 GB
cac143 Epyc 7443 2.8 GHz 48 24 2 avx2, sse3, 2xA30 GPU 24GB 512 GB
cac144 Epyc 7443 2.8 GHz 48 24 2 avx2, sse3, 2xA30 GPU 24GB 512 GB
cac145 Epyc 7443 2.8 GHz 48 24 2 avx2, sse3, 2xA30 GPU 24GB 512 GB
cac200 8362 2.8 GHz 64 32 2 avx2, sse3, 2xA100 GPU 512 GB
cac201 6430 2.1 GHz 64 32 2 avx2, sse3, 2xL4 GPU 256 GB

Documentation

Quickstart

For those who want to just log on and get started with the new system, the bare essentials are shown below.

Logging on

Login to the Frontenac cluster is via SSH access only. You will need an SSH client like Terminal on Linux/macOS or MobaXterm on Windows. To log on to the cluster, execute the following command in your SSH client of choice:

ssh -X yourUserName@login.cac.queensu.ca

The first time you log on, you will be prompted to accept this server's RSA key (d0:9f:e9:e2:b0:fe:6b:56:bb:74:46:c5:fb:89:a4:41). Type "yes" to proceed, then enter your password normally. No characters appear while typing your password.

Filesystems

The Frontenac cluster uses a shared GPFS filesystem for all file storage. User files are located under /global/home, shared project space under /global/project, and network scratch space under /global/scratch. In to network storage, each compute node has a 1.5TB local hard disk for fast access to local scratch space by jobs using the location specified by the $TMPDISK environment variable.

Submitting jobs

Frontenac uses the SLURM scheduler instead of Sun Grid Engine. The sbatch command is used to submit jobs, squeue can be used to check the status of jobs, and scancel can be used to kill a job. For users looking to get started with SLURM as fast as possible, a minimalist template job script is shown below:

#!/bin/bash
#SBATCH -c num_cpus                        # Number of CPUS requested. If omitted, the default is 1 CPU.
#SBATCH --mem=megabytes                    # Memory requested in megabytes. If omitted, the default is 1024 MB.
#SBATCH -t days-hours:minutes:seconds      # How long will your job run for? If omitted, the default is 3 hours.

# some demo commands to use as a test
echo 'starting test job...'
sleep 120
echo 'our job worked!'

Assuming our job is called test-job.sh, we can submit it with sbatch test-job.sh. Detailed documentation can be found on our SLURM documentation page. One final thing to note is that it is possible to submit an interactive job with srun --x11 --pty bash. This starts a personal bash shell on a node with resources available.

Accounts, Allocations, Partitions

Please check out our helpfile about allocations on the Frontenac Cluster


  1. 1.0 1.1 1.2 1.3 1.4 This node is contributed and has a 3 hour time limit.
  2. 2.0 2.1 2.2 2.3 2.4 AMD EPYC will show up as 4 NUMA nodes in Slurm.