NVIDIA A100 4x GPU HGX Redstone Platform

3
Supermicro HGX A100 4x GPU Board Redstone Cover
Supermicro HGX A100 4x GPU Board Redstone Cover

While the 8x NVIDIA A100 GPU “Delta” platform with NVSwitch got a lot of airtime during the Ampere launch, it was not the only platform being launched today by NVIDIA. The 4x GPU “Redstone” platform is a smaller NVLink mesh platform that is designed to be a lower-cost option.

NVIDIA A100 4x GPU HGX Redstone Platform

The NVIDIA A100 “Redstone” HGX platform is important since it is a smaller and less complex version of the HGX A100 platform. The Redstone platform incorporates 4x SXM NVIDIA A100 GPUs onto a PCB. As we saw with the Tesla A100 overview, the new GPUs have 12x NVlinks per GPU. Each NVLink provides 50GB/s of GPU-to-GPU bandwidth for 600GB/s total.

NVIDIA Tesla A100 NVLink Bandwidth
NVIDIA A100 NVLink Bandwidth

Redstone takes those 12 NVLinks and splits them into three groups. Instead of theĀ NVIDIA NVSwitch solution we see on the HGX A100 platform, we get a mesh topology without switching. NVIDIA has offered both switched and non-switched systems for some time.

NVIDIA NVSwitch System
NVIDIA NVSwitch System

This type of topology, NVIDIA has been using for years and is the basis for many important compute nodes. For example, Summit uses NVLink directly attached between Tesla V100 GPUs. With four GPUs per node, each GPU can talk directly to every other GPU.

IBM Power9 Talk At Hot Chips 31 OpenCAPI And NVLink Accelerator Bandwidth
IBM Power9 Talk At Hot Chips 31 OpenCAPI And NVLink Accelerator Bandwidth

The importance of Redstone is that the smaller HGX A100 4 GPU board uses much less power due to having fewer GPUs and omitting NVSwitch. Leaving NVSwitch out also means one saves on per-node systems costs. If you simply wanted to 7 MIGs per Tesla A100 up to 4 Tesla A100’s per instance, then this topology can make a lot more sense. Supermicro, along with other vendors are adding A100 4 GPU systems to their portfolios.

Supermicro HGX A100 4x GPU Board Redstone Example
Supermicro HGX A100 4x GPU Board Redstone Example

As with the larger HGX A100 option, these GPUs have PCIe Gen4 connectivity to their hosts. One can use the HGX A100 4 GPU Redstone board with Intel Xeon Scalable, however, to get full PCIe Gen4 performance one needs to use the AMD EPYC 7002 family or potentially an emerging Arm or POWER option.

You can see the Supermicro Redstone platform based on the AMD EPYC 7002 Rome series here:

Supermicro HGX A100 4x GPU Redstone Platform
Supermicro HGX A100 4x GPU Redstone Platform

This is a great example of how server OEMs can take the HGX A100 4 GPU platform and innovate to provide their own feature sets around the new Ampere generation.

Final Words

For many organizations, the 4x GPU mesh architecture has made a lot of sense. The new HGX A100 4 GPU Redstone platform makes integration of these solutions much easier but also moves some of the design differentiation away from NVIDIA’s partners. Still, this seems to make sense from an industry perspective. Other companies, such as Dell have focused on these 4x Tesla GPU compute nodes for its customers instead of pushing larger solutions. For customers who want the smaller, less costly, and less complex form factor, Redstone makes a lot of sense.

3 COMMENTS

  1. Jon

    Not a fully enabled product yet – and would be my guess that yes the 6th stack would be for sideband/native HBM ECC.

  2. Any ideas on how much it costs? The HGX-2 costs $200k. Wonder if Redstone is in the realm of small startup budget.

LEAVE A REPLY

Please enter your comment!
Please enter your name here