We recently found a product that was on the bottom of a page on NVIDIA’s website without much publicity. That product is the NVIDIA BlueField-2 A100. The basic concept is combining a BlueField-2 DPU that includes an Arm-based CPU, Mellanox NIC, and associated memory and storage along with a NVIDIA A100 GPU. We reached out to NVIDIA about the product and did not get much back. Still, this is an early product that demonstrates the reason NVIDIA acquired Mellanox and also the power of a DPU so we wanted to highlight it.
NVIDIA BlueField-2 A100
NVIDIA has discussed a few concepts for adding Mellanox IP plus GPUs in the past. Specifically, we focused on the NVIDIA EGX A100 previously. We also heard about the NVIDIA BlueField-2X which looks similar but with a major difference. The NVIDIA BlueField-2 A100 is a bit different. Here is the photo of the EGX A100 again:
The EGX A100 is using a ConnectX-6 NIC with a NVIDIA A100 GPU, but this is not the DPU version. The PCB has a cutout at the end for a fan and the top of the PCB has NVLink connectors. We can contrast this with the BlueField-2 A100:
As you can see, we have what appears to be the higher-end NVIDIA BlueField-2 ports on the faceplate. Here is the 100GbE/ IB BlueField-2 DPU as an example:
We can see that the out-of-band management port and the data ports are flipped on the A100 version. This is similar to what we saw with the BlueField-2X:
At first, one may think that this looks very similar to the BlueField-2 A100. There is a significant difference though. If we look at the top portion of the BlueField-2X the edge of the PCB is flat.
When we zoom into the BlueField-2 A100 one can see that we have top edge connectors as we saw on the EGX A100 and also A100 PCIe GPUs.
NVIDIA has three sets of top connectors on standard A100 PCIe cards that are used for NVLink bridges. Here are two A100’s with the NVLink bridges attached:
It seems like the BlueField-2 A100 is similar to the BlueField-2X except perhaps that it is using the A100 (BlueField-2X GPU was not listed) and has these edge connectors.
Why this is an exciting card is that the BlueField-2 A100 has the capability to put a NVIDIA A100 directly on network fabric without going through a traditional x86 server. Unlike with the EGX A100, the BlueField-2 A100 has CPU cores, memory, and storage and runs its own OS. Here is a BlueField-2 Ubuntu shot from our piece A Quick Look at Logging Into a Mellanox NVIDIA BlueField-2 DPU.
There are, of course, trade-offs with the BlueField-2 A100. One of them will need to be power and cooling. The BlueField-2 DPU uses enough power that going above 2.0GHz on its eight Arm cores means an auxiliary power connector is needed for the x16 card. Having a DPU that takes 60-75W from a PCIe power budget of 250W (or 300W with the new NVIDIA A100 80GB PCIe) is a fairly huge percentage.
Our best guess is that this may not be seen as a training product in this generation. Instead, as we highlighted in our ASUS RS720A-E11-RS24U review each A100 can be split into up to seven Multi-Instance GPU (or MIG) instances.
MIG works extremely well with AI inference as we saw with MLPerf Inference v1.0 and inference is often a lower-power workload on the A100. That effectively would allow a BlueField-2 A100 to be attached to a network and provide an AI inference node.
All of this is fun, but we have yet to see this in the field. Still, it shows the vision of where this is headed and one can see the early precursor to the Arm-azing Grace in the future. If you read STH, and are still unsure about what a DPU is, we have What is a DPU A Data Processing Unit Quick Primer and video:
If you want to learn more about the continuum of current NIC solutions in the market to understand what is a SmartNIC v. DPU or DPU v. an exotic solution like Intel’s IPU, you can see the STH NIC Continuum Framework with a video here:
Hopefully, we will see these products in the wild instead of just renderings. NVIDIA did not confirm exact specs for us, but we were told this is an A100 with a BlueField-2 DPU on a single PCIe card. We can update this if we get more specs.