MLPerf Training v3.1 results were out, and it was probably one of the least exciting rounds we have seen in some time, except for two results. First, NVIDIA is using MLPerf to show off its latest Hopper-generation supercomputer. Second, in an uncommon occurrence, both Intel and NVIDIA trumpeted the Intel Gaudi 2 accelerator’s excellent price/ performance.
Intel Scores a Huge Gaudi2 Win in NVIDIA MLPerf Training v3.1
There were 209 results run across 83 different configurations between the Closed division and Open. Open only had two configurations submitted, each with a single but different result. MLPerf Training v3.1 has nine tests, so the average completion ratio of the benchmark suite is only around 27% for each submitted configuration. Only 14 of 209 results were anything other than NVIDIA GPUs. Six of those were Intel showing off Intel Xeon Sapphire Rapids performance in AI, setting a baseline for CPU performance in the industry.
The first big takeaway is that NVIDIA is showing off its new supercomputer. Dubbed NVIDIA Eos, this is a 10,752 H100 GPU system connected via 400Gbps Quantum-2 InfiniBand. Putting this into some perspective, if a company were to buy this on the open market, it would likely be a $400M+ USD system.
With its giant supercomputer, NVIDIA can win on the MLPerf Training suite across varying GPU numbers. It even matched the Microsoft Azure cloud running at 10,752 NVIDIA H100 GPUs.
In general, so few other companies are submitting results to MLPerf Training that NVIDIA had to make up a summary slide showing that very few others are using its MLPerf Training suite. We know from AMD earnings that it is shipping MI300 variants, which would be “commercially available” for MLPerf. We also know that companies like Cerebras sold a $1B(+) Wafer-Scale Cluster without MLPerf. MLPerf Inference is often fascinating for vendor-to-vendor comparisons. MLPerf Training has lost that, except for one instance.
Looking at the Stable Diffusion test above, one might see NVIDIA as being around 8x faster than Intel Gaudi2. Then we get to the details. NVIDIA uses 16 times as many H100 GPUs to get that kind of speedup. Putting that into perspective, we would expect that it is well over 32x the cost to go 8x faster. Put another way, NVIDIA’s marketing slide here shows Intel is somewhere between relatively close and 4x better performance per dollar versus NVIDIA (~8x the performance for ~32x the cost.)
In a direct comparison with 64 accelerators, NVIDIA is around twice as fast. That is an excellent result for Intel since it uses accelerators that cost less than half its NVIDIA counterparts and has a much simpler system architecture.
NVIDIA is further along tuning its H100 than Intel is with Gaudi2, but the big delivery with this MLPerf was supporting FP8. That led to a ~2x speedup in performance.
Intel now has an exciting play. The Gaudi2 is seemingly somewhere between A100 and H100 performance. Still, from what we understand, it costs less than half of NVIDIA’s H100 part on an accelerator-to-accelerator basis but can be much lower in total system costs.
Last week, we looked at some Gaudi2 systems in our Inside the Intel Developer Cloud tour:
Intel Gaudi2 is getting enough traction that it is supply-constrained, but less so than the H100.
We are going to give a special mention to the NVIDIA L40S that made a single appearance in this set. We just went into why the L40S is becoming an alternative where one can use more L40S GPUs given their cost and availability advantage over the H100. Check out the NVIDIA L40S vs H100 PCIe piece here. The MLPerf Results seemed to track that story.
An honorable mention also goes to Google for its TPUv5e submission. There is not much to compare it to directly, given it is cloud-only, but at least they submitted something.
It feels like most of the industry other than NVIDIA has abandoned MLPerf Training, save for a few Intel results and a lone Google result. The NVIDIA H100 SXM5 is the big GPU today, and few use MLPerf to challenge that. There are only a handful of companies that can scale their hardware and software past 1000 accelerators, so at the top end, there is not a lot of competition. If you want 10,000 accelerators six months from now, most likely, you are buying a NVIDIA H100 SXM5 system.
With the NVIDIA Eos supercomputer, NVIDIA has something in-house that would have a retail value of over $400M. Perhaps no other company in the industry can afford to match that at this point. Of course, that is retail value, not what NVIDIA is paying for its own parts, but it is a big machine.
A fun result of NVIDIA’s race to scale is that it showed Intel’s Gaudi 2 can deliver stable diffusion performance at four times the performance per dollar. NVIDIA was trying to show it had a much longer bar on the chart, but in the race for a longer bar, it lost scaling efficiency and gave Intel a big win.