HPC Computing Components

Compute Nodes

Compute nodes are the foundation of a computer cluster and do the actual number-crunching. They are servers arranged in a rack, connected by a network, and configured to work together as one unit through cluster management software.

There are three main types of compute nodes:

  1. Rack Servers - standard computers built for a wide variety of applications.
  2. GPU Servers - rack servers with a GPU (Graphics Processing Unit) card that significantly accelerates the performance of the CPU.
  3. High-Density Servers - servers modified to fit more nodes in the same amount of rack space (these can be either Twin or Blade servers).

InfiniBand is currently the best networking technology for HPC clusters, but it is more expensive than Ethernet. Applications which require maximum possible performance - such as oil and gas exploration modeling and financial computation, for example - should build their clusters with InfiniBand networking technology.

Head Node

The head node is the server that manages the cluster. There is nothing exceptionally different about the hardware of the head node compared to the compute nodes. But, the head node does store the operating system (OS) for the entire cluster as well as the cluster middleware, and it also takes care of network and job management for the whole system.

Storage

Storage systems do not have as powerful computational resources (e.g. processors and RAM memory) as compute nodes because they are designed for a different function. Storage servers store the masses of files that compute nodes need to access only on an occasional basis (e.g. a database).

Because of its role in the cluster, storage servers must have high I/O rates so that the compute nodes can quickly read and write data from the hard drives. No matter how fast one's compute nodes are, if there is an I/O bottleneck in the storage servers, the entire cluster may perform poorly.

One of our specialties is finding and offering the best storage configurations on the market that help overcome I/O bottlenecks. There are several technologies that make our cluster storage solutions optimal:

  1. 6Gb/s RAID controllers read and write to the drives twice as fast as the older 3Gb/s models that many clusters are still running.
  2. CacheCade technology by LSI expands the cache of RAID controllers with solid-state drives (SSDs) by up to 500GB, drastically increasing I/O rates especially for databases.
  3. Active-active controllers are available for special storage systems and offer shared storage redundancy between two RAID controllers, maximizing data protection.
  4. Lustre File System, developed by Oracle, is the leading storage software for HPC and can be found in both small workgroup clusters as well as in some of the largest supercomputers to date.

Component Reference

Ethernet Switch

Connects all of the nodes using the Ethernet network to communicate together and allows for centralized cluster management.

InfiniBand Switch

Connects all of the nodes using the InfiniBand network to communicate together and allows for centralized cluster management.

SAS Switch

Allows compute nodes to more efficiently access data stored on JBOD systems.

KVM Switch

Allows for management of all cluster nodes from a remote location.

Head Node

The node tasked with running the middleware that manages and updates all of the cluster resources.

PDU

The power distribution unit efficiently allots the electricity required for each node and switch to function.

Blade Compute Nodes

High-density modular servers such as these are powerful, efficient, and easy to manage.

Compute Nodes

1U rack servers frequently serve as the compute nodes in an HPC cluster, and their performance may be accelerated with extra physical processing cores or GPUs.

Storage Nodes

The storage nodes store the often-expansive quantity of data generated by HPC clusters and allow compute nodes to efficiently read and write information to their drives.