MLPerf Inference v3.1 Shows NVIDIA Grace Hopper and a Cool AMD TPU v5e Win

1
NVIDIA GH200 Refresh
NVIDIA GH200 Refresh

NVIDIA’s MLPerf Inference v3.1 is out. The benchmark result suite on the data center side is still mostly a NVIDIA affair with a few Qualcomm Cloud AI 100 results, and Intel results. Still, the two most interesting results are the NVIDIA GH200 Grace Hopper and the Google GPU v5e.

MLPerf Inference 3.1

MLPerf Inference 3.1 is mostly an edge affair. There are thousands of results from software companies that have small tuning changes on edge platforms. Instead, we generally look at the data center results.

Intel Xeon Max Chip 1
Intel Xeon Max Chip 1

Of the 65 data center closed and two preview results, there are eight Intel Xeon Platinum 8480+ results, one Intel Xeon Max 9480 result, and one Habana Gaudi2 result. Qualcomm was back with the Cloud AI 100 and had five results. Google had a TPU v5e result that is really interesting for not just the accelerators. Still, all of these result lines were not submitted on all results, so one way to look at this is that <25% of the configurations were not NVIDIA but the total benchmark results were well over 85% NVIDIA making other submissions almost rounding errors.

NVIDIA submitted single H100 80GB results as well as NVIDIA GH200 Grace Hopper results. The GH200 were ~2-17% faster with an average of just over 9% faster. There are, of course, some major differences with one directly connected CPU to GPU instead of a SXM setup. Still, NVIDIA is setting up a case to say “NVIDIA GPUs work best with NVIDIA CPUs” in the future so it can push Intel and AMD from AI servers.

AMD EPYC 9004 Genoa With Milan Rome Intel Xeon Ice Lake Sapphire Rapids 13th Gen Core Ampere Altra Max 2
AMD EPYC 9004 Genoa With Milan Rome Intel Xeon Ice Lake Sapphire Rapids 13th Gen Core Ampere Altra Max 2

The other interesting announcement was the Google TPU v5e. Google outlined the TPUv4 at Hot Chips 2023. Usually, Google only talks about hardware once it has new versions in production. Still, the new v5e result had an interesting configuration detail: the CPU was the AMD EPYC 7B13. That means that the AMD Socket SP3 Rome/Milan generation dominated the NVIDIA A100 and TPU v5e generation of accelerator platforms.

As an honorable mention, theĀ Moffett accelerators made a debut, but because of their software platform, they do not appear in the data center closed results.

Final Words

Overall, there is not much to learn about the new MLPerf Inference results. Any AI-related topic is hot these days, however, companies like Cerebras are selling $1B USD AI clusters without submitting to NVIDIA’s MLPerf benchmark. Qualcomm still has its chips. Intel is positioning for AI inference on CPU. It is not a huge secret in the industry that, aside from the Intel Gaudi2 AI accelerator cluster, the company has big Gaudi2 demand despite only submitting on one of seven benchmarks.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.