Hardware:M9000
Contents
The Enterprise M9000 Servers
The "M9K's" are our large SMP systems that have served as main compute cluster. However, they have reached their "end of life" and will be de-commissioned during 2016. These servers run on the Solaris platform which will be discontinued along with them. Future replacements of these servers will .be running on a standard Linux platform (CentOS).
Type of serverOur cluster consists of eight shared-memory machines that are highend SPARC Enterprise M9000 Servers which Sun Microsystems built in partnership with Fujitsu. Access is handled exclusively by Grid Engine, including test jobs that are specific to these servers. The server nodes are called m9k0001...m9k0008 Each of these servers consists of 64 quad-core 2.52 Ghz Sparc64 VII processors. Each of these chips has 4 compute cores, and each core is capable of Chip Multi Threading with 2 hardware threads. This means that each of the servers is capable of working simultaneously on up to 512 threads. In total they are able to process more than 4000 threads. As each core carries two Floating-Point Units that can handle Additions and Multiplications in a "Fused" manner (FMA), the cluster has a TPP of up to 20 TFlops. Chip Multi Threading (CMT) is a technology that allows multiple threads (process) to simultaneously share a single computing resource, such as a core. This increases the efficiency of usage of the core. At the same time, multiple cores share chip resources, thereby improving their utilization. Our servers have a total of 2TByte of memory (8 GB per core). These machines are suitable for very-high-memory applications. For more information on the Sparc64 VII Architecture, please check out this website. Main PurposeThe main emphasis in these high-end Shared-Memory servers is to deliver the maximum possible floating-point performance while not compromising on memory requirements. The large memory of these servers make them ideally suited for large-scale computations. Large L2 caches keep memory latencies low, while chip multithreading technology increases core utilization. Who Should Use these MachinesIf you are just starting to run applications on our systems, we advise against using the M9000 servers as your platform. This is because the servers have reached the end of their life and will be decommisioned during 2016. Their capacity will be replaced by high-memory systems of the x86/Linux type. |
Using the M9000 serversAccessThe server can be accessed directly through ssh from IP address 130.15.59.64 (login node sflogin0). They also can be accessed from the Secure Portal (dtterm (sfnode0) or xterm (sfnode0)) which brings you to the same (Solaris) login node. The file systems for all of our clusters are shared, so you will be using the same home directory. The login node can be used for compilation, program development, and testing only, not for production jobs. Compiling codeSince the architecture of the Sparc64 VII chips of the M9000 Servers differs in some important details from the one of the login node, it may be a good idea to re-compile your code whenever possible. This is simple in most cases:
will switch to the (newer) update 3 compilers.
Running jobsAs mentioned earlier, program runs for user and application software on the login node are allowed only for test purposes. Production runs must be submitted to Grid Engine. For a description of how to use Grid Engine, see the GridEngine Help File. Grid Engine will schedule jobs to a default pool of machines unless otherwise stated. This default pool contains presently only the our M9000 nodes m9k0002-7. Therefore, you need to add no special script lines to be scheduled to these servers exclusively. Note that your jobs will run on dedicated threads, i.e. up to 512 processes can be scheduled to a single server. The Grid Engine will do the scheduling, i.e. there is no way for the user to determine which processes run on which cores. |
Further HelpFor a more thorough review of Multi-core environment, please check out this PDF. You might want to follow some of the links provided in this document. We also supply user support (please contact us at cac.help@queensu.ca), so if you experience problems, we can assist you. |