Hardware
Hardware
The HPCC maintains a large number of computing resources designed to support research.
In general, users write their programs, submit a request/job to the scheduling system to run on the cluster. The job request specifies how much time and computing resources will needed. Since these resources are shared users/programs that overutilize the system, causing nodes to become unresponsive, may be terminated without prior notice.
Compute Cluster
ICER manages MSU’s High Performance Computing Center (HPCC), which maintains four clusters that are available to MSU researchers as a free, shared resource. These clusters comprise a total of 1,047 nodes, which collectively have 56,236 CPU cores, 614 GPUs (including NVidia K20, K80, V100, and V100S models), and 317 TB of memory. The theoretical peak speed of the entire system is approximately 3.9 Petaflops (for double precision floating-point operations; for single precision workloads the number is approximately twice that value). The nodes are connected via low-latency InfiniBand FDR (56 Gbit), EDR/HDR100 (100 Gbit), and HDR (200 Gbit) and share high-speed parallel file systems based on Lustre and GPFS with a total capacity over 8 petabytes for persistent and temporary storage with aggregate performance above 100 gigabytes/sec. See User Documentation for more information.
Interactive Development Hardware
For each hardware type in the compute cluster, a single node is set aside for software development and testing. Users have a direct SSH connection to these nodes through the HPCC gateway. These nodes are shared resources and programs can run for up to two CPU hours before being terminated. Longer jobs should be submitted to the cluster.