NVIDIA H100 NVL for High-End AI Inference Launched

March 22, 2023

The NVIDIA H100 HVL may look like something we have seen before, but there is a big difference. We asked NVIDIA, and the company says that logically this is two GPUs to the OS, but that the NVLink will allow the full 188GB of memory to be used by the system.

NVIDIA H100 NVL for High-End AI Inference Launched

The new NVIDIA H100 NVL brings two NVIDIA H100 PCIe together with NVLink, and a twist. The new NVL version has 94GB of HBM3 memory per GPU for a total of 188GB. That likely means that the sixth 16GB stack is activated, but with only 14GB available for 94GB of the 96GB active.

What is really interesting as well is the TDP. These are 350W to 400W TDP PCIe cards. Generally, 300W is the top-end we see from most other vendors in PCIe cards since many servers cannot handle 400W in PCIe form factors. That is a big driver for higher-end OAM/ SXM form factors.

	H100 SXM	H100 PCIe	H100 NVL
FP64	34 teraFLOPS	26 teraFLOPS	68 teraFLOPs
FP64 Tensor Core	67 teraFLOPS	51 teraFLOPS	134 teraFLOPs
FP32	67 teraFLOPS	51 teraFLOPS	134 teraFLOPs
TF32 Tensor Core	989 teraFLOPS¹	756teraFLOPS¹	1,979 teraFLOPs¹
BFLOAT16 Tensor Core	1,979 teraFLOPS¹	1,513 teraFLOPS¹	3,958 teraFLOPs¹
FP16 Tensor Core	1,979 teraFLOPS¹	1,513 teraFLOPS¹	3,958 teraFLOPs¹
FP8 Tensor Core	3,958 teraFLOPS¹	3,026 teraFLOPS¹	7,916 teraFLOPs¹
INT8 Tensor Core	3,958 TOPS¹	3,026 TOPS¹	7,916 TOPS¹
GPU memory	80GB	80GB	188GB
GPU memory bandwidth	3.35TB/s	2TB/s	7.8TB/s
Decoders	7 NVDEC 7 JPEG	7 NVDEC 7 JPEG	14 NVDEC 14 JPEG
Max thermal design power (TDP)	Up to 700W (configurable)	300-350W (configurable)	2x 350-400W (configurable)
Multi-Instance GPUs	Up to 7 MIGS @ 10GB each		Up to 14 MIGS @ 12GB each
Form factor	SXM	PCIe Dual-slot air-cooled	2x PCIe Dual-slot air-cooled
Interconnect	NVLink: 900GB/s PCIe Gen5: 128GB/s	NVLink: 600GB/s PCIe Gen5: 128GB/s	NVLink: 600GB/s PCIe Gen5: 128GB/s

Based on the specs, it seems like, assuming the NVIDIA H100 NVL specs are for 400W, that the PCIe versions are vastly superior to the H100 SXM5 versions but without the higher-end 900GB/s NVLINK interfaces. The compute specs are 2x the H100 SXM, but the NVL version has more memory, higher memory bandwidth, and uses similar power for the performance.

Final Words

Our sense is that the NVL will have to be de-rated or the H100 SXM5 will need a spec bump soon to match. This is very strange positioning. Still, NVIDIA says that OpenAI that is using DGX A100’s now for ChatGPT can replace up to 10x DGX A100 systems with four sets of NVIDIA H100 NVL pairs to do its inferencing. It will be interesting to see whether these get de-rated or an updated H100 SXM5 over time.

5 COMMENTS

Matt March 22, 2023 At 6:28 am

“Based on the specs, it seems like, assuming the NVIDIA H100 NVL specs are for 400W, that the PCIe versions are vastly superior to the H100 SXM5 versions given they are 300W less but without the higher-end 900GB/s NVLINK interfaces.”

TDP is a design specification. Just because the SXM part can be configured to allow it to operate at 700 W doesn’t mean it needs 700 W to operate. A processor can be limited by various different things for its performance, power being just one of them. Just because a processor can draw 700 W doesn’t mean it needs to in order to perform a certain workload as fast as one that can only draw 400 W. The SXM form factor may allow for a much better cooling solution than the PCIE card. That would allow increased performance for workloads that are power-limited.
L.P. March 22, 2023 At 7:02 am

“The compute specs are 2x the H100 SXM, but the NVL version has more memory, higher memory bandwidth, and uses 57% the power for the same performance.”

I think there is an error in your reporting here. The specs quoted here are for the two cards **together**. As such the H100 SXM still enjoys a clear positioning in the market if you ask me.
Patrick Kennedy March 22, 2023 At 7:22 am

Good point folks. Let me fix that.
Tibo March 22, 2023 At 8:03 am

Just checked with Dell, they have them in new XE9680, but seams that lead time is a bit high…
emerth March 22, 2023 At 3:29 pm

Seems like a vram bump, and one has to buy in pairs to get it. Which is just more market segmentation in my books.

If, as the marketing vaguely implies it was transparent shared address space btwn the two GPU chips that’d be cool. But it actually looks like a pair of H100 connected by, OMG!, nvlink.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

NVIDIA H100 NVL for High-End AI Inference Launched

Final Words

RELATED ARTICLESMORE FROM AUTHOR

Supermicro NVIDIA GB200 NVL72 System at Computex 2024

Tenstorrent Wormhole Developer Kits Launched

AMD Ryzen AI 300 Series Launched

5 COMMENTS

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR