NVIDIA EOS A Top 10 Supercomputer Shown

0
NVIDIA EOS 2024 02 15
NVIDIA EOS 2024 02 15

Today, NVIDIA showed a bit more of its Top 10 supercomputer. The NVIDIA EOS currently sits at #9 on the latest November 2023 Top500 list of most powerful supercomputers at the double precision linpack benchmark. That is notable because NVIDIA de-emphasized double precision gains for AI performance generations ago. Hitting top 10 in a benchmark the system was not designed for is a big achievement. As a result, this is a massive AI supercomputer.

NVIDIA EOS A Top 10 Supercomputer Shown

NVIDIA EOS has 576 NVIDIA DGX H100 systems and uses NVIDIA Quantum-2 400Gb/s Infiniband for an Rmax of 121.4 PFlops/s on double precision linpack, but 18.4 exaflops of FP8 AI compute.

NVIDIA EOS Fiber 2024 02 15
NVIDIA EOS Fiber 2024 02 15

Given that there are 576 DGX H100 systems, we have 4608 GPUs, which is likely well over $200M+ if you were trying to put it together at street pricing.

NVIDIA EOS Racks 2 2024 02 15
NVIDIA EOS Racks 2 2024 02 15

NVIDIA is using its SuperPOD architecture to build EOS in blocks that it can then scale to larger topologies. A big part of this announcement also reminds folks in the industry that NVIDIA can scale to 4608 accelerators (and more), while many of the other AI training cards are not able to scale as easily to that number.

NVIDIA EOS Racks 2024 02 15
NVIDIA EOS Racks 2024 02 15

Here, we can see that NVIDIA has four systems per rack, which should use 32kW or less per rack. That would be the assumption until we see the rear of the systems.

NVIDIA EOS Liquid Cooling 2024 02 15
NVIDIA EOS Liquid Cooling 2024 02 15

In the shot above, we can see the liquid cooling rack manifolds behind the individual on the KVM cart. If you want to learn about how liquid cooling work, you can see How Liquid Cooling Servers Works with Gigabyte and CoolIT.

We recently also looked at a liquid-cooled Supermicro SYS-821GE-TNHR 8x NVIDIA H100 AI Server

And also QCT’s liquid cooling solution:

For future AI servers, folks are going to want to use liquid cooling because of the power efficiency gains from the transition to liquid versus air. If you want to deploy AI servers, you need to be thinking of liquid cooling, and NVIDIA is showing that with EOS.

Final Words

The NVIDIA EOS supercomputer is one that the company can use for its internal development purposes and also can do work for customers. Having a large-scale cluster available is something that differentiates NVIDIA from some of its competitors.

This was another small look at EOS, the system NVIDIA showed last November. I know Patrick has been trying to do an in-person tour for a long time, similar to the Intel Developer Cloud tour.

Maybe in the B100 generation, that will happen?

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.