Difference between revisions of "Hardware:Frontenac"
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | The Frontenac cluster is CAC's | + | The Frontenac cluster is CAC's primary compute cluster. It features a set of hardware, a network configuration, a slurm scheduler, an lmod software module system, Centos operating system, and a set of compilers and related software. This page is intended to give an overview of its capabilities and provide a migration guide for new users. |
== Hardware == | == Hardware == | ||
Line 525: | Line 525: | ||
| avx2, sse3 | | avx2, sse3 | ||
| 256 GB | | 256 GB | ||
+ | |- | ||
+ | | cac100 | ||
+ | | 6226R | ||
+ | | 3.6 GHz | ||
+ | | 32 | ||
+ | | 16 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 191 GB | ||
+ | |- | ||
+ | | cac102 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac104 | ||
+ | | 6130 | ||
+ | | 2.1 GHz | ||
+ | | 32 | ||
+ | | 16 | ||
+ | | 2 | ||
+ | | avx2, sse3, 3xGP100 GPU | ||
+ | | 191 GB | ||
+ | |- | ||
+ | | cac105 | ||
+ | | 6130 | ||
+ | | 2.1 GHz | ||
+ | | 32 | ||
+ | | 16 | ||
+ | | 2 | ||
+ | | avx2, sse3, 3xGP100 GPU | ||
+ | | 191 GB | ||
+ | |- | ||
+ | | cac106 | ||
+ | | 6130 | ||
+ | | 2.1 GHz | ||
+ | | 32 | ||
+ | | 16 | ||
+ | | 2 | ||
+ | | avx2, sse3, 3xGP100 GPU | ||
+ | | 191 GB | ||
|- | |- | ||
| cac107 | | cac107 | ||
Line 552: | Line 597: | ||
| avx2, sse3, 1xV100 GPU | | avx2, sse3, 1xV100 GPU | ||
| 191 GB | | 191 GB | ||
+ | |- | ||
+ | | cac111<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref> | ||
+ | | EPYC 7551P | ||
+ | | 2.0 GHz | ||
+ | | 32<ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref> | ||
+ | | 32 | ||
+ | | 1 | ||
+ | | avx2, sse3, 1xTitan GPU | ||
+ | | 128 GB | ||
+ | |- | ||
+ | | cac112<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref> | ||
+ | | EPYC 7551P | ||
+ | | 2.0 GHz | ||
+ | | 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref> | ||
+ | | 32 | ||
+ | | 1 | ||
+ | | avx2, sse3, 1xRTX4000 GPU | ||
+ | | 128 GB | ||
+ | |- | ||
+ | | cac113<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref> | ||
+ | | EPYC 7551P | ||
+ | | 2.0 GHz | ||
+ | | 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref> | ||
+ | | 32 | ||
+ | | 1 | ||
+ | | avx2, sse3, 1xRTX4000 GPU | ||
+ | | 128 GB | ||
+ | |- | ||
+ | | cac114<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref> | ||
+ | | EPYC 7551P | ||
+ | | 2.0 GHz | ||
+ | | 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref> | ||
+ | | 32 | ||
+ | | 1 | ||
+ | | avx2, sse3, 2xRTX4000 GPU | ||
+ | | 128 GB | ||
+ | |- | ||
+ | | cac115<ref name=contrib>This node is contributed and has a 3 hour time limit.</ref> | ||
+ | | EPYC 7551P | ||
+ | | 2.0 GHz | ||
+ | | 32 <ref name=numa>AMD EPYC will show up as 4 NUMA nodes in Slurm.</ref> | ||
+ | | 32 | ||
+ | | 1 | ||
+ | | avx2, sse3, 1xRTX4000 GPU | ||
+ | | 128 GB | ||
+ | |- | ||
+ | | cac116 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac117 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac118 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac119 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac120 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac121 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac122 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac123 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac124 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac1125 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac126 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac127 | ||
+ | | 6338 | ||
+ | | 2.0 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3 | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac140 | ||
+ | | Epyc 7443 | ||
+ | | 2.8 GHz | ||
+ | | 48 | ||
+ | | 24 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xA30 GPU 24GB | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac141 | ||
+ | | Epyc 7443 | ||
+ | | 2.8 GHz | ||
+ | | 48 | ||
+ | | 24 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xA30 GPU 24GB | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac142 | ||
+ | | Epyc 7443 | ||
+ | | 2.8 GHz | ||
+ | | 48 | ||
+ | | 24 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xA30 GPU 24GB | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac143 | ||
+ | | Epyc 7443 | ||
+ | | 2.8 GHz | ||
+ | | 48 | ||
+ | | 24 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xA30 GPU 24GB | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac144 | ||
+ | | Epyc 7443 | ||
+ | | 2.8 GHz | ||
+ | | 48 | ||
+ | | 24 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xA30 GPU 24GB | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac145 | ||
+ | | Epyc 7443 | ||
+ | | 2.8 GHz | ||
+ | | 48 | ||
+ | | 24 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xA30 GPU 24GB | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac200 | ||
+ | | 8362 | ||
+ | | 2.8 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xA100 GPU | ||
+ | | 512 GB | ||
+ | |- | ||
+ | | cac201 | ||
+ | | 6430 | ||
+ | | 2.1 GHz | ||
+ | | 64 | ||
+ | | 32 | ||
+ | | 2 | ||
+ | | avx2, sse3, 2xL4 GPU | ||
+ | | 256 GB | ||
|- | |- | ||
|} | |} | ||
Line 602: | Line 872: | ||
Please check out our helpfile about [[Allocation|allocations on the Frontenac Cluster]] | Please check out our helpfile about [[Allocation|allocations on the Frontenac Cluster]] | ||
+ | |||
+ | ---- |
Latest revision as of 13:51, 19 December 2023
The Frontenac cluster is CAC's primary compute cluster. It features a set of hardware, a network configuration, a slurm scheduler, an lmod software module system, Centos operating system, and a set of compilers and related software. This page is intended to give an overview of its capabilities and provide a migration guide for new users.
Contents
Hardware
The Centre for Advanced Computing operates a cluster of X86 based multicore machines running Linux.This page explains essential features of this cluster and is meant as a basic guide for its usage.
|
Documentation
- Migrating off the Frontenac System
- Logging on to the system
- List of installed software and how to use it
- Storage and filesystems
- Submitting jobs using SLURM
- SLURM accounting and special job submission
Quickstart
For those who want to just log on and get started with the new system, the bare essentials are shown below.
Logging on
Login to the Frontenac cluster is via SSH access only. You will need an SSH client like Terminal on Linux/macOS or MobaXterm on Windows. To log on to the cluster, execute the following command in your SSH client of choice:
ssh -X yourUserName@login.cac.queensu.ca
The first time you log on, you will be prompted to accept this server's RSA key (d0:9f:e9:e2:b0:fe:6b:56:bb:74:46:c5:fb:89:a4:41
). Type "yes" to proceed, then enter your password normally. No characters appear while typing your password.
Filesystems
The Frontenac cluster uses a shared GPFS filesystem for all file storage. User files are located under /global/home
, shared project space under /global/project
, and network scratch space under /global/scratch
. In to network storage, each compute node has a 1.5TB local hard disk for fast access to local scratch space by jobs using the location specified by the $TMPDISK
environment variable.
Submitting jobs
Frontenac uses the SLURM scheduler instead of Sun Grid Engine. The sbatch
command is used to submit jobs, squeue
can be used to check the status of jobs, and scancel
can be used to kill a job. For users looking to get started with SLURM as fast as possible, a minimalist template job script is shown below:
#!/bin/bash #SBATCH -c num_cpus # Number of CPUS requested. If omitted, the default is 1 CPU. #SBATCH --mem=megabytes # Memory requested in megabytes. If omitted, the default is 1024 MB. #SBATCH -t days-hours:minutes:seconds # How long will your job run for? If omitted, the default is 3 hours. # some demo commands to use as a test echo 'starting test job...' sleep 120 echo 'our job worked!'
Assuming our job is called test-job.sh
, we can submit it with sbatch test-job.sh
. Detailed documentation can be found on our SLURM documentation page. One final thing to note is that it is possible to submit an interactive job with srun --x11 --pty bash
. This starts a personal bash shell on a node with resources available.
Accounts, Allocations, Partitions
Please check out our helpfile about allocations on the Frontenac Cluster