This is the NVIDIA DGX GB200 NVL72

12
Patrick With NVIDIA DGX GB200 NVL72 At GTC 2024
Patrick With NVIDIA DGX GB200 NVL72 At GTC 2024

On the show floor at GTC 2024, NVIDIA had a rack configured with its DGX GB200 NVL72. Since we grabbed a few photos of the future of DGX, we thought we would share them. Seeing the physical system makes it feel more tangible.

This is the NVIDIA DGX GB200 NVL72

That may sound like an unwieldy name (I doubled checked that it was the correct name for this system) but it is very descriptive. It is a NVIDIA DGX system. The GB200 tells us we have a Grace Blackwell GB200 compute structure. NVL72 tells us about the NVLink interconnect.

NVIDIA GB200 72 Blackwell GPUs Fully Connected By NVLink
NVIDIA GB200 72 Blackwell GPUs Fully Connected By NVLink

Here is the 120kW flagship system stacked in a single rack. A huge portion of data centers can support a maximum of 60kW racks at this point, so there will be other configurations. We fully expect a half-stack system in the future for those who cannot handle 120kW per rack or nearing 1MW for an 8x rack SuperPOD.

NVIDIA DGX GB200 NVL72 Front 1
NVIDIA DGX GB200 NVL72 Front 1

At the top, we have switches.

NVIDIA DGX GB200 NVL72 Switches
NVIDIA DGX GB200 NVL72 Switches

Moving down the stack, we can see the compute nodes. There are ten compute nodes in the top stack. On the front panel we can see each node’s dual Infiniband ports, four E1.S drive trays, and management ports. On the right side we can see the BlueField-3 DPUs.

NVIDIA DGX GB200 NVL72 Compute Node Front
NVIDIA DGX GB200 NVL72 Compute Node Front

Each of these compute nodes has two Grace Arm CPUs. Each Grace is connected to two Blackwell GPUs for each compute node.

NVIDIA Blackwell Compute Node
NVIDIA Blackwell Compute Node

Here is another look at the internals both with and without cooling blocks.

NVIDIA DGX GB200 NVL72 Compute Nodes Internal
NVIDIA DGX GB200 NVL72 Compute Nodes Internal

Below these compute nodes are the nine NVSwitch shelves. Something to note is that these gold features are handles to remove the shelves.

NVIDIA DGX GB200 NVL72 NVSwitch Front
NVIDIA DGX GB200 NVL72 NVSwitch Front

These NVLink Switch trays have two NVLink switch chips.

NVIDIA GB200 Internal NVLink Switch
NVIDIA GB200 Internal NVLink Switch

Here is what they look like inside.

NVIDIA DGX GB200 NVL72 Switch Tray Internal
NVIDIA DGX GB200 NVL72 Switch Tray Internal

On the bottom, we have eight more compute nodes.

NVIDIA DGX GB200 NVL72 Compute Nodes Bottom
NVIDIA DGX GB200 NVL72 Compute Nodes Bottom

Moving to the rear, we can see that this is a bus bar power delivery design. The rack is designed to blind-mate the power via the bus bar, as well as the liquid cooling nozzles and the NVLink connections for each component. Each needs to allow for a bit of movement to ensure the blind mating works properly.

NVIDIA DGX GB200 NVL72 Rear
NVIDIA DGX GB200 NVL72 Rear

NVIDIA said that using the copper-cabled NVLink in the rear saves something like 20kW of power in a rack like this.

NVIDIA DGX GB200 NVL72 NVLink Spine Without Optics
NVIDIA DGX GB200 NVL72 NVLink Spine Without Optics

The net result is one giant system.

Final Words

Hopefully, this was a quick and fun look at the NVIDIA DGX GB200 NVL72. For some who were unsure of what they were seeing, we had our original NVIDIA GTC 2024 Keynote Coverage that goes into many of the components. Still, it is cool to see in person and get a bit of a closer look.

NVIDIA GB200 Close
NVIDIA GB200 Close

We will have more from the show floor this week, so stay tuned.

NVIDIA Blackwell GPU
NVIDIA Blackwell GPU

12 COMMENTS

  1. So, 120 kW. Are the power bus bars 48VDC, or are they putting AC PSes on each node? In either case, that’s an immense amount of current. 2500A of 48 VDC, or 500A of 240V. Or are they doing something clever here, like really high voltages or high-frequency AC? It’s hard to see how the power losses in conversion would pay off, but sometimes surprising results can pop out of this sort of thing once you factor in cooling, backup power, redundancy, etc.

  2. I mean, trying to run 120 kW through a single copper bus bar at 48V leads to a comically large bus bar. Something like 5 square inches. Presumably they break the bus bar up into segments, and likely have 2 busses as well, but there’s no way to keep that from being enormous. 240V AC is less bad, but still huge.

  3. @Scott Laird, looking at the close-up of the system node, you can see two fairly large power cables going into a power board right in front of the fans. It appears to be just a DC-DC converter, so I’m guessing that it’s a single bus running 480 VDC, which would be relatively efficient to get to from 480 VAC 3 phase Wye configuration. By your calculations, that would then be only 250A on the bus, which isn’t terrible.

  4. Holy crap. That’s one hell of a mainframe, if I do say so myself. But why? Why does AI require that much power? Just insane.

  5. @Guy Fisher
    My conclusion is 48V. There are 2 sets of 3 power shelves above and below the main compute block. Each shelf has 6 front-ends. This configuration matches with OpenRack V3. The front ends can each be 3KW or apparently soon up to 5.5KW, giving either 108KW or >198 KW of PSU. With the larger ones that is enough for some redundancy even with say 4KW front-ends. Also the compute sleds seem narrower than the power shelves, which would match with an OpenRack design using 21″ widths, but keeping flexibility on the compute side. 48V would be 2500 amps, but fed top and bottom, so no more than 1250 on the bus bar in any location, and only for a short distance.

  6. I have a question about the NVLink networking.
    Now my understanding is that each node has 2 boards effectively (in this case a board consists of 2 GPUs and a CPU), like a SuperMicro Twin server.
    And then each board has 2 NVLinks of 1.8TB/s per port. Which means 4 ports per node in total.
    And then there are 9 NVLink switches that support 8 ports in total per switch. So each 1U NVLink switch can connect 2 nodes.
    But how are those 9 switches get connected to each other ? Cause the way it’s presented is like only 2-node sets are connected to each other, for 9 such sets, but no hub connection (or top-of-rack if you will) between the 9 NVLink switches.

    For clarity, I’m not talking about the Infiniband or Ethernet connections. In a picture from their presentation it shows that each node has 4 NICs with 2 ports per NIC, at 800Gbps per port. But that is for the inter-server connections, not the GPUs, which in my understanding communicate over NVLink.

  7. @Stephen Beets yes AI power requirements are utterly insane as is the price. These are rough ballparks but a 2000 GPU system will cost ~$80 million to procure, and that doesn’t include the data centre, HVAC etc. The annual electricity bill will be in the region of $8-10 million and you are looking at needing a 3-4MW connection to the electricity grid depending on your cooling method. I spent the first six weeks of this year getting actual quotes from vendors as part of a bidding process to host such a system.

  8. @Panos Grigoriadis
    NVLink works in a Peer-to-Peer fashion. Each of the 9 NVLink switch boards has two chips, each providing 4 ports with 18 links, so 9*2*4*18=1296 total.
    Each B200 GPU has 18 links. The entire rack has up to 72 GPUs, each with 18 links, which also gives 1296 links total. Thus every GPU can talk to every other GPU in the system.
    What I’m wondering is what NVIDIA meant by “fifth-generation NVLink, which connects up to 576 GPUs in a single NVLink domain with over 1 PB/s total bandwidth”. I don’t know how that can be achieved with what was shown for the single rack NVLink networking. That would indeed require something akin to top of the rack NVLink switch. Unless they are using the 800GBit/s “classical” networking in some fashion to connect the 8 racks.

  9. Looking at what was used for Hopper generation it seems that Blackwell NVLink switches shown here are not the only design. Most likely there’s an extended version that contains uplinks to an aggregate NVLink Network Switch. There’s a lot of free space in the chassis.
    Hopper used 400GBit/s PHY as base, so it’s not unreasonable to expect 800GBit/s for Blackwell. The Hopper design had a presentation at HotChips 2022.

  10. Thanks Patrick! This is why I come here. Real systems.

    BTW, can you confirm the gold handles are 24k or are they merely 18k?

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.