It turns out, you can make the Top500 list with less than 100 nodes in your cluster. The NVIDIA DGX Superpod proves that point using only 96 nodes to reach #22 on the June 2019 Top500 list. Of course, those 96 nodes are just a part of the overall machine that has a lot more going on.
NVIDIA DGX Superpod 96 Node Top500 Supercomputer
Built using 96 NVIDIA DGX-2H machines, each with 16 NVIDIA Tesla V100 GPUs, means that the NVIDIA DGX Superpod has a total of 1536 GPUs. Here are the NVIDIA DGX-2H specs from our NVIDIA DGX-2H Now with 450W Tesla V100 Modules product announcement article.
NVIDIA says that the DGX Superpod, only took three weeks to put together. The fabric for the NVIDIA DGX Superpod is, unsurprisingly, Mellanox. We are seeing a greater marketing emphasis on Mellanox collaboration points after NVIDIA agreed to acquire Mellanox earlier this year. NVIDIA on the call said that each DGX-2H uses ten Mellanox 100Gbps cards for its network fabric. There is one NIC for each CPU as well as one for every pair of NVIDIA Tesla V100 GPUs.
Discussing the NVIDIA DGX Superpod, the company says it is the supercomputer it uses for tasks like training models for self-driving cars. Taking a step back, this is NVIDIA the hardware and software company, using its tools to make AI breakthroughs as much or more than for traditional scientific computing applications. As an industry, we have said that AI development will lead to a technological breakthrough, so having a supercomputer at your company that has significant AI capabilities can help leapfrog corporate discovery, or so hopes NVIDIA.
The NVIDIA DGX Superpod despite only being double-digits in terms of compute nodes, is still a 1-megawatt supercomputer. NVIDIA says it is about a 9.4 petaflop high-performance Linpack machine.