Difference between revisions of "Hardware:SW"

From CAC Wiki
Jump to: navigation, search
 
(71 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= The SW (Linux) Cluster =
+
{|  style="border-spacing: 8px;"
 +
| valign="top" width="50%" style="padding:1em; border:1px solid #fa5882; background-color:#f6eee3; border-radius:7px" |
 +
'''The SW cluster has been decomissioned. Please refer to the [[Hardware:Frontenac|Frontenac Cluster]]'''
 +
<center>
 +
|}
 +
 
 +
== The SW (Linux) Cluster ==
  
[[File:sw_rack_280512.jpg|thumb|left|alt=Software Linux Cluster|Software Linux Cluster]]
+
The Centre for Advanced Computing operates a cluster of X86 based multicore machines running Linux.This page explains essential features of this cluster and is meant as a basic guide for its usage.<br clear=all>
The Centre for Advanced Computing operates a cluster of X86 based multicore machines running Linux.This page explains essential features of this cluster and is meant as a basic guide for its usage.
+
  
 
{|  style="border-spacing: 8px;"
 
{|  style="border-spacing: 8px;"
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
== Type of Hardware ==
 
  
Our cluster consists of multiple X86 multicore nodes made by Dell and IBM (both based on Intel x5670 or E7-4860). All nodes  run CentOS Linux and share a file system. Access is handled by Grid Engine. The server nodes are called sw0011...sw0054.
+
{| class="wikitable" style="float:left; margin-right: 25px;"
 
+
!colspan="6"| '''SW (Linux) Cluster Nodes ("old" sw series)'''
* Presently, the workup node of the HPCVL "Software Cluster" is '''swlogin1'''. This is a [http://www.dell.com/downloads/emea/products/R410_spec_sheet.pdf Dell PowerEdge R410 Server] with 2 sockets with a 6-core Intel® Xeon® processor (Intel x5675) running at 3.1 GHz.
+
|-
 
+
|'''Host'''
* Most of the nodes of the SW cluster (sw0015-40) are [http://www.dell.com/downloads/emea/products/R410_spec_sheet.pdf Dell PowerEdge R410] Servers that have 2 sockets with a 6-core Intel Xeon processor ([http://ark.intel.com/products/47920/Intel-Xeon-Processor-X5670-12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI Intel x5670] / [http://ark.intel.com/products/52577/Intel-Xeon-Processor-X5675-12M-Cache-3_06-GHz-6_40-GTs-Intel-QPI x5675]) that runs at 2.9/3.07 GHz. These nodes offer a total of 12 cores that are 2-fold hyperthreaded, i.e. they support up to 24 threads. The scheduler is configured such that only 12 threads are run at a time. These nodes have 64 Gbyte (sw0015-22, sw0035-40) or 32 Gbyte (sw0023-34) of physical memory.
+
|'''CPU model'''
 +
|'''Speed'''
 +
|'''Cores'''
 +
|'''Threads'''
 +
|'''Memory'''
 +
|-
 +
| sw0044
 +
| Xeon E7-4860
 +
| 2.3GHz
 +
| 40
 +
| 80
 +
| 256 GB
 +
|-
 +
| sw0045
 +
| Xeon E7-4860
 +
| 2.3GHz
 +
| 40
 +
| 80
 +
| 256 GB
 +
|-
 +
| sw0046
 +
| Xeon E7-4860
 +
| 2.3GHz
 +
| 40
 +
| 80
 +
| 256 GB
 +
|-
 +
| sw0047
 +
| Xeon E7-4860
 +
| 2.3GHz
 +
| 40
 +
| 80
 +
| 256 GB
 +
|-
 +
| sw0048
 +
| Xeon E7-4860
 +
| 2.3GHz
 +
| 40
 +
| 80
 +
| 256 GB
 +
|-
 +
| sw0049
 +
| Xeon E7-4860
 +
| 2.3GHz
 +
| 40
 +
| 80
 +
| 256 GB
 +
|-
 +
!colspan="6"| [[File:x3950.jpg|thumb|left|alt=Software (SW) Linux Cluster|Software (SW) Linux Cluster]]
 +
|}
  
* A few nodes (sw0041-51) are [https://lenovopress.com/tips0817 '''IBM XServers 3850-X5'''] that are also based on the Intel® Xeon® processor ([http://ark.intel.com/products/53571/Intel-Xeon-Processor-E7-4860-24M-Cache-2_26-GHz-6_40-GTs-Intel-QPI Intel E7-4860]). These servers have a total of 40 cores per node and support for up to 80 threads (hyperthreading). The clock speed for these machines is 2.27GHz. Two of these servers (sw0050-51) have a 1 TB of physical memory, the others have 256 GB.
+
{| class="wikitable" style="float:left; margin-right: 25px;"
 
+
!colspan="6"| '''SW (Linux) Cluster Nodes ("new" cac series)'''
* Finally, two of our nodes (sw0052,sw0054) are '''IBM Servers''' based on the [http://ark.intel.com/products/53572/Intel-Xeon-Processor-E7-8860-24M-Cache-2_26-GHz-6_40-GTs-Intel-QPI Intel E7-8860] processors with 80 cores total (160 threads) running at 2.27 GHz, while another one (sw0053) with 80 cores (160 threads) uses the E7-8870 at 2.4 GHz. Each of the three have 512 GB of memory.
+
 
+
{| class="wikitable"
+
!colspan="6"| '''SW (Linux) Cluster Nodes'''
+
 
|-
 
|-
 
|'''Host'''  
 
|'''Host'''  
Line 28: Line 77:
 
|'''Memory'''
 
|'''Memory'''
 
|-
 
|-
| sw0013
+
| cac019
| Xeon X5675
+
| E7-4860
| 3.07GHz
+
| 2.3 GHz
| 12
+
| 40
 +
| 80
 +
| 256 GB
 +
|-
 +
| cac020
 +
| E7-4830 v3
 +
| 2.1 GHz
 +
| 48
 +
| 96
 +
| 1.2 TB
 +
|-
 +
| cac021
 +
| E7-4830 v3
 +
| 2.1 GHz
 +
| 48
 +
| 96
 +
| 1.2 TB
 +
|-
 +
| cac022
 +
| E7-8860
 +
| 2.3 GHz
 +
| 80
 +
| 160
 +
| 512 GB
 +
|-
 +
| cac023
 +
| E7-8860
 +
| 2.4 GHz
 +
| 80
 +
| 160
 +
| 512 GB
 +
|-
 +
| cac024
 +
| E7-8860
 +
| 2.4 GHz
 +
| 80
 +
| 160
 +
| 512 GB
 +
|-
 +
| cac025
 +
| E7-4860
 +
| 2.3 GHz
 +
| 40
 +
| 80
 +
| 1 TB
 +
|-
 +
| cac026
 +
| E7-4860
 +
| 2.3 GHz
 +
| 40
 +
| 80
 +
| 1 TB
 +
|-
 +
| cac027
 +
| E7-8850 v2
 +
| 2.3 GHz
 +
| 48
 +
| 96
 +
| 256 GB
 +
|-
 +
| cac028
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 256
 +
| 2 TB
 +
|-
 +
| cac028
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 256
 +
| 2 TB
 +
|-
 +
| cac029
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 256
 +
| 2 TB
 +
|-
 +
| cac030
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 256
 +
| 2 TB
 +
|-
 +
| cac032
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 256
 +
| 2 TB
 +
|-
 +
| cac033
 +
| E7-8867 v3
 +
| 2.5 GHz
 +
| 128
 +
| 256
 +
| 2 TB
 +
|-
 +
| cac034
 +
| E5-2650 v4
 +
| 2.2 GHz
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0014
+
| cac035
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0015
+
| cac036
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0016
+
| cac037
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0017
+
| cac038
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0018
+
| cac039
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0019
+
| cac040
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0020
+
| cac041
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0021
+
| cac042
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0022
+
| cac043
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0023
+
| cac044
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0024
+
| cac045
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0025
+
| cac046
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0026
+
| cac047
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0027
+
| cac048
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0028
+
| cac049
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0029
+
| cac050
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0030
+
| cac051
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0031
+
| cac052
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0032
+
| cac053
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0033
+
| cac054
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0034
+
| cac055
| Xeon X5675
+
| E5-2650 v4
| 3.07GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 32 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0035
+
| cac056
| Xeon X5670
+
| E5-2650 v4
| 2.93GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0036
+
| cac057
| Xeon X5670
+
| E5-2650 v4
| 2.93GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0037
+
| cac058
| Xeon X5670
+
| E5-2650 v4
| 2.93GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0038
+
| cac059
| Xeon X5670
+
| E5-2650 v4
| 2.93GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|
 +
| 256 GB  
 
|-
 
|-
| sw0039
+
| cac060
| Xeon X5670
+
| E5-2650 v4
| 2.93GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0040
+
| cac061
| Xeon X5670
+
| E5-2650 v4
| 2.93GHz
+
| 2.2 GHz
| 12
+
 
| 24  
 
| 24  
| 64 GB  
+
|  
 +
| 256 GB  
 
|-
 
|-
| sw0041
+
| cac062
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0042
+
| cac063
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0043
+
| cac064
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0044
+
| cac065
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0045
+
| cac066
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0046
+
| cac067
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0047
+
| cac068
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0048
+
| cac069
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|  
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0049
+
| cac070
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|  
 
| 256 GB  
 
| 256 GB  
 
|-
 
|-
| sw0050
+
| cac071
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|  
| 1 TB
+
| 256 GB
 
|-
 
|-
| sw0051
+
| cac072
| Xeon E7- 4860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 40
+
| 24
| 80
+
|  
| 1 TB
+
| 256 GB
 
|-
 
|-
| sw0052
+
| cac073
| Xeon E7- 8860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 80
+
| 24
| 160
+
|  
| 512 GB  
+
| 256 GB  
 
|-
 
|-
| sw0053
+
| cac074
| Xeon E7- 8870
+
| E5-2650 v4
| 2.40GHz
+
| 2.2 GHz
| 80
+
| 24
| 160
+
|  
| 512 GB  
+
| 256 GB  
 
|-
 
|-
| sw0054
+
| cac075
| Xeon E7- 8860
+
| E5-2650 v4
| 2.27GHz
+
| 2.2 GHz
| 80
+
| 24
| 160
+
|  
| 512 GB  
+
| 256 GB  
 
|-
 
|-
| sw0055
+
| cac076
| Xeon X5680
+
| E5-2650 v4
| 3.33GHz
+
| 2.2 GHz
| 12
+
| 24
| 24
+
|  
| 144 GB  
+
| 256 GB  
 
|-
 
|-
| sw0056
+
| cac077
| Xeon X5680
+
| E5-2650 v4
| 3.33GHz
+
| 2.2 GHz
| 12
+
| 24
| 24
+
|
| 144 GB  
+
| 256 GB
 +
|-
 +
| cac078
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24  
 +
|  
 +
| 256 GB
 +
|-
 +
| cac079
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac080
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac081
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac082
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac083
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac084
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac085
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac086
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac087
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac088
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac089
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac090
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac091
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac092
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB
 +
|-
 +
| cac093
 +
| E5-2650 v4
 +
| 2.2 GHz
 +
| 24
 +
|
 +
| 256 GB  
 
|-
 
|-
| sw0057
 
| Xeon X5680
 
| 3.33GHz
 
| 12
 
| 24
 
| 144 GB
 
|}
 
 
|}
 
|}
==Why these Systems?==
 
  
The main emphasis in these systems is a high floating-point performance for a modest number of processes / threads. Since commercial software such as Fluent and Abaqus is increasingly focussed on support for Linux only, this cluster was acquired to continue to offer recent versions of these software packages. In addition, the higher single-core performance of these nodes (compared to the Sparc/Solaris based M9000 cluster, for instance), allows for a more efficient use of license seats which usually a priced per-core.
+
== Type of Hardware ==
 +
This cluster consists of X86 multicore nodes made by Lenovo and IBM. All nodes run CentOS Linux and share a file system. Access is handled by Grid Engine. The server nodes are called cac019...cac099.
  
==Who Should Use This Cluster?==
+
* Presently, the workup node of the "Software Cluster" is '''swlogin1'''. This is a [http://www.dell.com/downloads/emea/products/R410_spec_sheet.pdf Dell PowerEdge R410 Server] with 2 sockets with a 6-core Intel® Xeon® processor (Intel x5675) running at 3.1 GHz.
  
The software cluster runs on the Linux operating system, and should therefore only be used if the software cannot be compiled or run on the Sparc/Solaris platform. Runs that require more than 64 Gbyte of memory should be performed on the M9000 cluster unless the program is parallelized using MPI with distributed memory and very low communication requirements.
+
* Most nodes on the cluster were built by Lenovo and are of the [https://lenovopress.com/tips1195-nextscale-nx360-m5-e5-2600-v3 Lenovo NeXtScale nx360 M5] type. These are based on 2 [http://ark.intel.com/products/91767/Intel-Xeon-Processor-E5-2650-v4-30M-Cache-2_20-GHz Intel Xeon E5-2650 12-core CPUs] that run at 2.2 GHz, for a total of 24 cores per node.
  
We suggest you '''consider using this compute cluster if'''
+
* Larger high-memory nodes were added at the same time (August 2016). These are of the [http://www.lenovo.com/images/products/system-x/pdfs/datasheets/x3950_x6_ds.pdf Lenovo System x3950 x6] 8-socket type with 8 x [http://ark.intel.com/products/84681/Intel-Xeon-Processor-E7-8867-v3-45M-Cache-2_50-GHz Intel E7-8867 v3] 16-core processors at 2.5 GHz for a total of 128 cores (dually hyperthreaded). Each of these units has a total of 2 TB of memory. They are used for special applications that require high memory.
  
* Your application is very floating-point intensive and has little else to do, and you need modest amounts of memory. With few exceptions no more than 64 Gbyte are available in the form of shared memory.
+
* Some older nodes are [https://lenovopress.com/tips0817 '''IBM XServers 3850-X5'''] that are also based on the Intel® Xeon® processor ([http://ark.intel.com/products/53571/Intel-Xeon-Processor-E7-4860-24M-Cache-2_26-GHz-6_40-GTs-Intel-QPI Intel E7-4860]). These servers have a total of 40 cores per node and support for up to 80 threads (hyperthreading). The clock speed for these machines is 2.27GHz. Two of these servers (sw0050-51) have a 1 TB of physical memory, the others have 256 GB.
  
* Your application is commercial or public-domain software that supports only Linux or poses considerable challenges to port to a Solaris platform.
+
* A few of our nodes are '''IBM Servers''' based on the [http://ark.intel.com/products/53572/Intel-Xeon-Processor-E7-8860-24M-Cache-2_26-GHz-6_40-GTs-Intel-QPI Intel E7-8860] processors with 80 cores total (160 threads) running at 2.27 GHz, while another one (sw0053) with 80 cores (160 threads) uses the E7-8870 at 2.4 GHz. Each of these have 512 GB of memory.
  
* Your application is either explicitly parallel (for instance, using MPI) and has very low communication requirements, or is multi-threaded with a small number (typically no more than 12) of scaling threads.
+
==Why these Systems?==
  
* Your application uses a commercial license that is scaled per process; in such cases it is favourable to use machines with the maximum per-process performance.
+
The main emphasis in these systems is a high floating-point performance for a modest number of processes / threads. Since commercial software such as Fluent and Abaqus offer support for Linux only, this cluster was originally acquired to offer recent versions of these software packages. In addition, the higher single-core performance of these nodes allows for an efficient use of license seats which usually a priced per-core.
  
'''This cluster might not be suitable if'''
+
==Who Should Use This Cluster?==
  
* You need to perform a large number of relatively short jobs, each serial or with very few threads. Jobs like this should be sent to the "Victoria Falls" cluster.
+
The software cluster runs on the Linux operating system and should be used by anyone who wants to run applications that are available on that platform. Runs that require more than 32 Gbyte of memory need to request this explicitly to avoid mis-scheduling.
  
* Your application is memory intensive and/or compiles and runs well on the Solaris/Sparc platform. Such jobs should be sent to the default M9000 cluster.
+
We suggest you use this cluster if:
  
* Your application is required to scale to a very large number of processes in a distributed-memory fashion and is communication intensive. Such jobs require a fast interconnect (Infiniband or similar) and should be run on a different system, for instance other Compute Canada installations.
+
* Your application is floating-point intensive with modest amounts of memory.  
  
If you think your application could run more efficiently on these machines, please contact us (help@hpcvl.org) to discuss any concerns and let us assist you in getting started.
+
* Your application is commercial or public-domain software that supports Linux.
  
Note that on these cluster (as on the M9000's), we have to enforce dedicated cores or CPUs to avoid sharing and context switching overheads. No "overloading" can be allowed.
+
* Your application is explicitly parallel (for instance, using MPI) and has low communication requirements, or is multi-threaded with a small number (typically no more than 12) of scaling threads.
  
==How Do I Use This Cluster?==
+
* Your application uses a commercial license that is scaled per process.
  
===... to access===
+
This cluster may not be suitable if:
  
The [https://portal.hpcvl.queensu.ca|HPCVL Secure Portal] offers a direct link called '''xterm (linux login node)'''. This link connects via a terminal to swlogin1 which is designated as a login/workup node for the cluster. If encounter issues with the portal login please let us know. Meanwhile, it is possible to "ssh" directly from sflogin0 to swlogin1 by typing
+
* Your application is very memory intensive. Long waiting time may be the consequence.
 +
 
 +
* Your application is required to scale to a very large number of processes in a distributed-memory fashion and is communication intensive. Such jobs require a fast interconnect (Infiniband or similar) and should be run on a different system, for instance other Compute Canada installations.
 +
 
 +
If you think your application could run more efficiently on these machines, please contact us (cac.help@queensu.ca) to discuss any concerns and let us assist you in getting started.
 +
 
 +
Note that we have to enforce dedicated cores or CPUs to avoid sharing and context switching overheads. No "overloading" can be allowed.
 +
|}
 +
 
 +
{|  style="border-spacing: 8px;"
 +
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#f7f7f7; border-radius:7px" |
 +
== Using the Cluster ==
  
:ssh sw0010
+
=== Access ===
  
and re-typing your system password.
+
* Indirectly through '''ssh from sflogin0''':
 +
<pre>ssh hpcXXXX@130.15.59.64
 +
hpcXXXX@130.15.59.64's password: *****
 +
hpcXXXX@sflogin0$ ssh swlogin1
 +
hpcXXXX@swlogin1's password: ***** </pre>
  
 
The file systems for all of our clusters are shared, so you will be using the same home directory as when you are using the M9000 servers or the standard login node sfnode0. swlogin1 can be used for compilation, program development, and testing only, '''not for production jobs'''.
 
The file systems for all of our clusters are shared, so you will be using the same home directory as when you are using the M9000 servers or the standard login node sfnode0. swlogin1 can be used for compilation, program development, and testing only, '''not for production jobs'''.
  
===... to compile and link===
+
=== Compiling Code ===
  
Since the SW cluster has a completely different architecture than the M9000 Servers code must be re-compiled when migrating to this cluster. The compiler that we are using on this cluster is the '''Intel Compiler Suite'''. This includes compilers for Fortran, C, and C++, as well as MPI and OpenMP support, debuggers and development suite. This software resides in /opt/ics and is only visible to the Linux cluster. The versions are:
+
==== Intel Compiler Suite ====
 +
The best compiler to use is the '''Intel Compiler Suite'''. This includes compilers for Fortran, C, and C++, as well as MPI and OpenMP support, debuggers and development suite. This software resides in /opt/ics. The versions are:
  
 
* Fortran ('''ifort'''): Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1 Build 20110811
 
* Fortran ('''ifort'''): Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1 Build 20110811
Line 395: Line 670:
  
 
This compiler suite needs to be activated before use. The command is
 
This compiler suite needs to be activated before use. The command is
: use ics
+
<pre>use icsmpi</pre>
  
In many cases, especially when software from the public domaine is involved, the preferable compilers are the '''gnu C/C++/Fortran compilers'''. The system version of these is:
+
==== Gnu Compilers ====
 +
In many cases, especially for public domain software, the preferable compiler is '''gnu C/C++/Fortran'''. The system version of these is:
 
<pre>
 
<pre>
 
Using built-in specs.
 
Using built-in specs.
Line 411: Line 687:
 
gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
 
gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
 
</pre>
 
</pre>
 +
 
No special activation is needed to use these, as they reside in a system director. A newer version of this compiler set is available in /opt/gcc-4.8.3 and can be access using the command
 
No special activation is needed to use these, as they reside in a system director. A newer version of this compiler set is available in /opt/gcc-4.8.3 and can be access using the command
: use gcc-4.8.3
 
  
For applications that cannot be re-compiled (for instance, because the source code is not accessible), a pre-compiled Linux version (x64 for Redhat will do the trick) needs to be obtained.
+
<pre>use gcc-4.8.3</pre>
  
===... to run jobs===
+
If MPI is required, it can be loaded through
  
As mentioned earlier, program runs for user and application software on the login node are allowed only for test purposes or if interactive use is unavoidable. In the latter case, please get in touch to let us know what you need. Production jobs must be submitted through the Grid Engine load scheduler. For a description of how to use Grid Engine, see the HPCVL GridEngine FAQ.
+
<pre>use openmpi</pre>
 +
 
 +
For applications that cannot be re-compiled (for instance, because the source code is not accessible), a pre-compiled Linux version (x64 for Redhat will do the trick) needs to be obtained.
  
Grid Engine will schedule jobs to a default pool of machines unless otherwise stated. This default pool contains presently only the M9000 nodes m9k0001-8. Therefore, you need to add the following two lines to your script for your job to be scheduled to the Linux SW cluster exclusively:
+
=== Running Jobs ===
  
: #$ -q abaqus.q
+
As mentioned earlier, program runs for user and application software on the login node are allowed only for test purposes or if interactive use is unavoidable. In the latter case, please get in touch to let us know what you need. Production jobs must be submitted through the [[HowTo:Scheduler|Grid Engine load scheduler]].  
: #$ -l qname=abaqus.q
+
  
The abaqus name for the queue that is added here derives from the initial software Abaqus that was (and still is) run on this cluster.
+
The name for the SGE queue that schedules to this cluster is '''abaqus.q'''. This does not have to be specified as it is the default.
 +
The abaqus name for the queue derives from the initial software Abaqus that was (and still is) run on this cluster.
  
 
Note that your jobs will run on dedicated threads, i.e. typically up to 12 processes can be scheduled to a single node. The Grid Engine will do the scheduling, i.e. there is no way for the user to determine which processes run on which cores.
 
Note that your jobs will run on dedicated threads, i.e. typically up to 12 processes can be scheduled to a single node. The Grid Engine will do the scheduling, i.e. there is no way for the user to determine which processes run on which cores.
Line 431: Line 709:
 
===Help?===
 
===Help?===
  
General information about using HPCVL facilities can be found in our FAQ pages. We also supply user support (please [mailto:help@hpcvl.org send email to help@hpcvl.org] or [[Contacts:UserSupport|contact us directly]]), so if you experience problems, we can assist you.
+
General information about using CAC facilities can be found in our FAQ pages. We also supply user support (please [mailto:cac.help@queensu.ca send email to cac.help@queensu.ca] or [[Contacts:UserSupport|contact us directly]]), so if you experience problems, we can assist you.

Latest revision as of 13:36, 19 January 2018

The SW cluster has been decomissioned. Please refer to the Frontenac Cluster

The SW (Linux) Cluster

The Centre for Advanced Computing operates a cluster of X86 based multicore machines running Linux.This page explains essential features of this cluster and is meant as a basic guide for its usage.

SW (Linux) Cluster Nodes ("old" sw series)
Host CPU model Speed Cores Threads Memory
sw0044 Xeon E7-4860 2.3GHz 40 80 256 GB
sw0045 Xeon E7-4860 2.3GHz 40 80 256 GB
sw0046 Xeon E7-4860 2.3GHz 40 80 256 GB
sw0047 Xeon E7-4860 2.3GHz 40 80 256 GB
sw0048 Xeon E7-4860 2.3GHz 40 80 256 GB
sw0049 Xeon E7-4860 2.3GHz 40 80 256 GB
Software (SW) Linux Cluster
Software (SW) Linux Cluster
SW (Linux) Cluster Nodes ("new" cac series)
Host CPU model Speed Cores Threads Memory
cac019 E7-4860 2.3 GHz 40 80 256 GB
cac020 E7-4830 v3 2.1 GHz 48 96 1.2 TB
cac021 E7-4830 v3 2.1 GHz 48 96 1.2 TB
cac022 E7-8860 2.3 GHz 80 160 512 GB
cac023 E7-8860 2.4 GHz 80 160 512 GB
cac024 E7-8860 2.4 GHz 80 160 512 GB
cac025 E7-4860 2.3 GHz 40 80 1 TB
cac026 E7-4860 2.3 GHz 40 80 1 TB
cac027 E7-8850 v2 2.3 GHz 48 96 256 GB
cac028 E7-8867 v3 2.5 GHz 128 256 2 TB
cac028 E7-8867 v3 2.5 GHz 128 256 2 TB
cac029 E7-8867 v3 2.5 GHz 128 256 2 TB
cac030 E7-8867 v3 2.5 GHz 128 256 2 TB
cac032 E7-8867 v3 2.5 GHz 128 256 2 TB
cac033 E7-8867 v3 2.5 GHz 128 256 2 TB
cac034 E5-2650 v4 2.2 GHz 24 256 GB
cac035 E5-2650 v4 2.2 GHz 24 256 GB
cac036 E5-2650 v4 2.2 GHz 24 256 GB
cac037 E5-2650 v4 2.2 GHz 24 256 GB
cac038 E5-2650 v4 2.2 GHz 24 256 GB
cac039 E5-2650 v4 2.2 GHz 24 256 GB
cac040 E5-2650 v4 2.2 GHz 24 256 GB
cac041 E5-2650 v4 2.2 GHz 24 256 GB
cac042 E5-2650 v4 2.2 GHz 24 256 GB
cac043 E5-2650 v4 2.2 GHz 24 256 GB
cac044 E5-2650 v4 2.2 GHz 24 256 GB
cac045 E5-2650 v4 2.2 GHz 24 256 GB
cac046 E5-2650 v4 2.2 GHz 24 256 GB
cac047 E5-2650 v4 2.2 GHz 24 256 GB
cac048 E5-2650 v4 2.2 GHz 24 256 GB
cac049 E5-2650 v4 2.2 GHz 24 256 GB
cac050 E5-2650 v4 2.2 GHz 24 256 GB
cac051 E5-2650 v4 2.2 GHz 24 256 GB
cac052 E5-2650 v4 2.2 GHz 24 256 GB
cac053 E5-2650 v4 2.2 GHz 24 256 GB
cac054 E5-2650 v4 2.2 GHz 24 256 GB
cac055 E5-2650 v4 2.2 GHz 24 256 GB
cac056 E5-2650 v4 2.2 GHz 24 256 GB
cac057 E5-2650 v4 2.2 GHz 24 256 GB
cac058 E5-2650 v4 2.2 GHz 24 256 GB
cac059 E5-2650 v4 2.2 GHz 24 256 GB
cac060 E5-2650 v4 2.2 GHz 24 256 GB
cac061 E5-2650 v4 2.2 GHz 24 256 GB
cac062 E5-2650 v4 2.2 GHz 24 256 GB
cac063 E5-2650 v4 2.2 GHz 24 256 GB
cac064 E5-2650 v4 2.2 GHz 24 256 GB
cac065 E5-2650 v4 2.2 GHz 24 256 GB
cac066 E5-2650 v4 2.2 GHz 24 256 GB
cac067 E5-2650 v4 2.2 GHz 24 256 GB
cac068 E5-2650 v4 2.2 GHz 24 256 GB
cac069 E5-2650 v4 2.2 GHz 24 256 GB
cac070 E5-2650 v4 2.2 GHz 24 256 GB
cac071 E5-2650 v4 2.2 GHz 24 256 GB
cac072 E5-2650 v4 2.2 GHz 24 256 GB
cac073 E5-2650 v4 2.2 GHz 24 256 GB
cac074 E5-2650 v4 2.2 GHz 24 256 GB
cac075 E5-2650 v4 2.2 GHz 24 256 GB
cac076 E5-2650 v4 2.2 GHz 24 256 GB
cac077 E5-2650 v4 2.2 GHz 24 256 GB
cac078 E5-2650 v4 2.2 GHz 24 256 GB
cac079 E5-2650 v4 2.2 GHz 24 256 GB
cac080 E5-2650 v4 2.2 GHz 24 256 GB
cac081 E5-2650 v4 2.2 GHz 24 256 GB
cac082 E5-2650 v4 2.2 GHz 24 256 GB
cac083 E5-2650 v4 2.2 GHz 24 256 GB
cac084 E5-2650 v4 2.2 GHz 24 256 GB
cac085 E5-2650 v4 2.2 GHz 24 256 GB
cac086 E5-2650 v4 2.2 GHz 24 256 GB
cac087 E5-2650 v4 2.2 GHz 24 256 GB
cac088 E5-2650 v4 2.2 GHz 24 256 GB
cac089 E5-2650 v4 2.2 GHz 24 256 GB
cac090 E5-2650 v4 2.2 GHz 24 256 GB
cac091 E5-2650 v4 2.2 GHz 24 256 GB
cac092 E5-2650 v4 2.2 GHz 24 256 GB
cac093 E5-2650 v4 2.2 GHz 24 256 GB

Type of Hardware

This cluster consists of X86 multicore nodes made by Lenovo and IBM. All nodes run CentOS Linux and share a file system. Access is handled by Grid Engine. The server nodes are called cac019...cac099.

  • Presently, the workup node of the "Software Cluster" is swlogin1. This is a Dell PowerEdge R410 Server with 2 sockets with a 6-core Intel® Xeon® processor (Intel x5675) running at 3.1 GHz.
  • Larger high-memory nodes were added at the same time (August 2016). These are of the Lenovo System x3950 x6 8-socket type with 8 x Intel E7-8867 v3 16-core processors at 2.5 GHz for a total of 128 cores (dually hyperthreaded). Each of these units has a total of 2 TB of memory. They are used for special applications that require high memory.
  • Some older nodes are IBM XServers 3850-X5 that are also based on the Intel® Xeon® processor (Intel E7-4860). These servers have a total of 40 cores per node and support for up to 80 threads (hyperthreading). The clock speed for these machines is 2.27GHz. Two of these servers (sw0050-51) have a 1 TB of physical memory, the others have 256 GB.
  • A few of our nodes are IBM Servers based on the Intel E7-8860 processors with 80 cores total (160 threads) running at 2.27 GHz, while another one (sw0053) with 80 cores (160 threads) uses the E7-8870 at 2.4 GHz. Each of these have 512 GB of memory.

Why these Systems?

The main emphasis in these systems is a high floating-point performance for a modest number of processes / threads. Since commercial software such as Fluent and Abaqus offer support for Linux only, this cluster was originally acquired to offer recent versions of these software packages. In addition, the higher single-core performance of these nodes allows for an efficient use of license seats which usually a priced per-core.

Who Should Use This Cluster?

The software cluster runs on the Linux operating system and should be used by anyone who wants to run applications that are available on that platform. Runs that require more than 32 Gbyte of memory need to request this explicitly to avoid mis-scheduling.

We suggest you use this cluster if:

  • Your application is floating-point intensive with modest amounts of memory.
  • Your application is commercial or public-domain software that supports Linux.
  • Your application is explicitly parallel (for instance, using MPI) and has low communication requirements, or is multi-threaded with a small number (typically no more than 12) of scaling threads.
  • Your application uses a commercial license that is scaled per process.

This cluster may not be suitable if:

  • Your application is very memory intensive. Long waiting time may be the consequence.
  • Your application is required to scale to a very large number of processes in a distributed-memory fashion and is communication intensive. Such jobs require a fast interconnect (Infiniband or similar) and should be run on a different system, for instance other Compute Canada installations.

If you think your application could run more efficiently on these machines, please contact us (cac.help@queensu.ca) to discuss any concerns and let us assist you in getting started.

Note that we have to enforce dedicated cores or CPUs to avoid sharing and context switching overheads. No "overloading" can be allowed.

Using the Cluster

Access

  • Indirectly through ssh from sflogin0:
ssh hpcXXXX@130.15.59.64
hpcXXXX@130.15.59.64's password: *****
hpcXXXX@sflogin0$ ssh swlogin1
hpcXXXX@swlogin1's password: ***** 

The file systems for all of our clusters are shared, so you will be using the same home directory as when you are using the M9000 servers or the standard login node sfnode0. swlogin1 can be used for compilation, program development, and testing only, not for production jobs.

Compiling Code

Intel Compiler Suite

The best compiler to use is the Intel Compiler Suite. This includes compilers for Fortran, C, and C++, as well as MPI and OpenMP support, debuggers and development suite. This software resides in /opt/ics. The versions are:

  • Fortran (ifort): Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1 Build 20110811
  • C (icc): Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1 Build 20110811
  • C++ (icpc): Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1 Build 20110811

This compiler suite needs to be activated before use. The command is

use icsmpi

Gnu Compilers

In many cases, especially for public domain software, the preferable compiler is gnu C/C++/Fortran. The system version of these is:

Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info 
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
--enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk 
--disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile 
--enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib 
--with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)

No special activation is needed to use these, as they reside in a system director. A newer version of this compiler set is available in /opt/gcc-4.8.3 and can be access using the command

use gcc-4.8.3

If MPI is required, it can be loaded through

use openmpi

For applications that cannot be re-compiled (for instance, because the source code is not accessible), a pre-compiled Linux version (x64 for Redhat will do the trick) needs to be obtained.

Running Jobs

As mentioned earlier, program runs for user and application software on the login node are allowed only for test purposes or if interactive use is unavoidable. In the latter case, please get in touch to let us know what you need. Production jobs must be submitted through the Grid Engine load scheduler.

The name for the SGE queue that schedules to this cluster is abaqus.q. This does not have to be specified as it is the default. The abaqus name for the queue derives from the initial software Abaqus that was (and still is) run on this cluster.

Note that your jobs will run on dedicated threads, i.e. typically up to 12 processes can be scheduled to a single node. The Grid Engine will do the scheduling, i.e. there is no way for the user to determine which processes run on which cores.

Help?

General information about using CAC facilities can be found in our FAQ pages. We also supply user support (please send email to cac.help@queensu.ca or contact us directly), so if you experience problems, we can assist you.