Today at GTC 2020 (#2), NVIDIA is announcing or re-announcing its DPU portfolio. First, we have the NVIDIA BlueField-2 DPU that was first announced and shown by Mellanox at VMworld 2019. The company also showed off BlueField-2 at VMworld 2020. Perhaps the more intriguing option is the BlueField-2X DPUs. The NVIDIA BlueField-2X combines Mellanox networking, with Arm cores, and an NVIDIA GPU which is especially exciting.
What is a DPU?
If you are looking for a quick primer on “What is a DPU?” we have a What is a DPU A Data Processing Unit Quick Primer. You can also see the video here:
In those pieces, we cover BlueField-2 and some of the other promising DPU solutions in the market.
NVIDIA BlueField-2 and BlueField-2X DPUs
This one feels a bit like it is an announcement that already happened. First, we had the Mellanox Bluefield-2 IPU SmartNIC launch in 2019. At the Ampere launch, we saw a ConnectX-6 NIC plus NVIDIA A100 GPU solution in the NVIDIA EGX A100. Let us go over what NVIDIA is announcing. Here is the block diagram and features from VMware Project Monterey.
This aligned closely to what Mellanox showed off as BlueField-2 in 2019.
At GTC 2020, the company is giving a bit more detail in terms of what the cards look like. We can see the new version still is powered by Arm Cortex A72 cores and ConnectX-6 IP. The new card is full height to support two QSFP28 100Gbps connectors for InfiniBand or Ethernet.
NVIDIA says that all of this offload “replaces 125 x86 cores” in its specs.
Since we have covered the BlueField-2 a lot, we are going to stop there. Again, you can see our recent What is a DPU? A Data Processing Unit Quick Primer piece where we go into how these specs compare with other current DPUs from other companies.
While BlueField-2 is cool, the NVIDIA BlueField-2X is a much more important product. The company is effectively taking the BlueField-2 card and adding an Ampere generation GPU.
This is very similar to what we saw NVIDIA launch at the first GTC 2020 with the NVIDIA EGX A100. Here is that card:
The big difference with the new NVIDIA BlueField-2X is that the card uses ConnectX-6 Dx IP in the context of the BlueField DPU. Effectively, NVIDIA is able to run a Linux OS, or in the near future VMware ESXi, on the BlueField-2X. The GPU is connected to the BlueField-2X’s PCIe Gen4 x16 lanes. From there, the DPU is connected to a fabric. This is the power of what NVIDIA is building. They can put an entire system on SoC (BlueField-2) with the CPU, NICs, accelerators, and security. They can then use that to attach GPUs to the network without x86 hosts as are done today.
To bring about this change, we have NVIDIA DOCA. NVIDIA DOCA is what the company plans to be the CUDA of the DPU era. It is a full SDK for BlueField-2 DPUs so that applications can be built atop DOCA and leverage NVIDIA hardware much in the same way that applications can leverage CUDA today.
This is still the early days of DOCA, but we hope to see more on this in the near future.
The power of this solution goes beyond today’s announcement. As you will notice, the BlueField-2 and BlueField-2X accelerators have RDMA and RoCE acceleration. While NVIDIA did not announce it today, we know that NVIDIA is using scale-out software-defined storage solutions in its Selene Supercomputer. BlueField-2 is designed to service these types of storage applications as well.
On a broader note, this solution is bringing together traditional NVIDIA GPUs, now with the recent Mellanox acquisition’s networking prowess. Embedded here are Arm compute cores that run their own OS. Make no mistake, the NVIDIA BlueField-2X is a shot across the x86 bow. NVIDIA did not confirm the GPU details, but it is not hard to imagine one day it includes a flagship-level GPU such as the NVIDIA A100. It is clearly something we see as the future vision of NVIDIA.