NVIDIA DGX-2H Now with 450W Tesla V100 Modules

November 18, 2018

While the NVIDIA DGX-2 has been king of the deep learning / AI training world since its launch (see NVIDIA DGX-2 March 2018 launch) NVIDIA’s partners are catching up with their HGX-2 designs. Now, NVIDIA is upping its top-level offering with the NVIDIA DGX-2H. For this, NVIDIA is essentially adding faster processors and raising the thermal limit on the Tesla V100 GPUs even more.

The NVIDIA DGX-2H builds upon the DGX-2 platform to offer even more performance to NVIDIA’s customers as its OEM partners ramp up HGX-2 sales. Since the DGX-2H shares most of the same topology as the DGX-2, we can point you to our NVIDIA DGX-2 at Hot Chips 30 piece for more information on the basic platform.

NVIDIA DGX-2H Specs

While the NVIDIA DGX-2H is an evolution compared to the DGX-2, there are a few interesting morsels in their data sheets. Here is the side-by-side of the official data sheet PDFs:

It appears as though the major differences are:

Intel Xeon platinum 8174 v. Intel Xeon Platinum 8168
12kW Maximum Power Consumption v. 10kW
Dual port primary networking is via dual 10/25/40/50/100GbE instead of 10/25GbE
Weight has gone up 20lbs to 360lbs
GPUs run at 450W instead of 350W TDP
Maximum operating temperature decreases from 35C to 25C

What is extremely strange is that the performance has not moved despite the higher frequency CPUs and extra 100W TDP NVIDIA Tesla V100 modules. We have reached out to NVIDIA and will update this piece with a response on why this is still considered a 2PF machine despite the CPU and GPU updates.

[Update 19 November 2018 at 9:30 AM Pacific] We reached out to NVIDIA regarding the 2 petaflop number. NVIDIA said that it should be 2.1 petaflops and will be updated accordingly.

Final Words

This is a big deal. You may have seen AMD Radeon MI60 numbers that compared to a 250W PCIe Tesla V100. Most DGX-1 class offerings run the SXM2 NVIDIA Tesla V100’s at 300W. The DGX-2 ran these V100’s at 350W TDP. Now the NVIDIA DGX-2H ups this to 450W TDP.

It was not long ago when accelerators had TDP in the 225W-300W range. Now, we are seeing 450W components. Extra TDP usually yields better performance, so we are not sure why this figure has not moved. At the same time, 450W is a sign of things to come. In a 10U chassis that is rated to consume 12kW, air cooling can be an option. For dense HPC applications to cool something like this will require liquid cooling. We are not sure if the extra cooling is the reason behind the 20lb weight gain, but the Intel Xeon Platinum SKU changes should weigh the same and the dual port networking changes should cause negligible weight differences.

1 COMMENT

Misha Engel December 5, 2018 At 5:58 pm

I have a feeling that AMD will beat this configuration with EPYC2+VEGA20 in the same 10U form factor at 12 kW.
For a price of around $300k incl. 4 TB of memory, 15 Mellanox ConnectX-6 and a Mellanox QM8700 switch.
With Tensor the DGX-2H might still be around 15% faster, with fp16, 32 and 64 the AMD system will be around 70% faster.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

NVIDIA DGX-2H Specs

Final Words

RELATED ARTICLESMORE FROM AUTHOR

Supermicro NVIDIA GB200 NVL72 System at Computex 2024

Tenstorrent Wormhole Developer Kits Launched

AMD Ryzen AI 300 Series Launched

1 COMMENT

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR