After the ZFS without a Server Using the NVIDIA BlueField-2 DPU we had some questions on the performance of the BlueField-2 NIC, and I sent a note to Patrick with a crazy idea: what if we just had a comparison to the chip versus a Raspberry Pi 4 B or maybe some of those small passively cooled router/ firewall nodes we have been testing. We found a small issue. We did not have enough storage on the BlueField-2 DPU to run our normal benchmark suite in sequence because the card did not have enough available and even over the 100GbE ports, we had some issues with non-local storage. So we pivoted and uploaded a few Geekbench 5 results just so our readers can get some sense and do their own comparisons.
The NVIDIA BlueField-2 DPU
What is important here is that we are using a specific SKU, the NVIDIA MBF2M516A-CEEOT is the actual part. This SKU has eight Arm Cortex A72 cores running at 2.0GHz.
This is important because it is an E-series DPU, not a P-series. NVIDIA has P-series DPUs that run at 2.75GHz, which we wish we had, but we have all E-series DPUs. The advantage of the E-series is that one does not need to add extra power, it is solely powered by the PCIe x16 slot. On the other side of the equation, 750MHz is a 37.5% boost in clock speed for these parts which means there should be a lot more performance available.
Here is the card. It is also worthwhile noting that the 16GB of memory is single channel not dual channel and that there are versions with 32GB onboard.
While having 35% more performance and double the memory footprint would be nice, this is what we have.
NVIDIA BlueField-2 DPU (MBF2M516A-CEEOT) Geekbench 5 Result
Geekbench is far from perfect, and it is also not the application that is going to take advantage of accelerators, the 100GbE, PCIe Gen4, and so forth. At the same time, it has a fairly decent result set and it is fairly easy for other folks to submit to. One of the challenges is that the only scores on GB5 were not labeled well and were lower than what we got. Here is the link to our result on the BlueField-2 DPU. That can be used to compare to other CPUs. For example, when we did the Building the Ultimate x86 and Arm Cluster-in-a-Box folks asked how fast a BlueField-2’s processor is compared to a Raspberry Pi 4. We Generally say 2.5-3x and one can see that here:
That makes the cluster-in-a-box with seven of these roughly equal to 20 or so Raspberry Pi’s just in Arm compute, without taking into account the AMD Ryzen Threadripper PRO 3995WX. It is also not using a HPC-focused benchmark where the AMD processor would do much better than the A72’s.
On the Intel side, perhaps the closest we have found is the Intel Celeron N5105. This is an 8-core BlueField-2 versus 4-core Jasper Lake Atom chart. If you want to know why we are excited about a potential Intel Atom C5000 series prospects, this is a good example:
Take the result linked above and drill into your favorite comparison. There are also sub-test components that one can look at in more depth. For example, the AES-XTS numbers for BlueField are quite good.
Again, Geekbench 5 is testing something that is not really the main value driver for BlueField-2 DPU, but we wanted some way to give a rough estimate of fast the Arm cores are on the BlueField-2 DPU. There are “P-series” DPUs that have 37.5% higher clock speeds that will be faster. Also, NVIDIA is going to have BlueField-3 later this year, so this is later in the cycle. The next-generation DPU/IPU parts from NVIDIA, Marvell, Intel, and others will have substantially increased Arm CPU performance. Still, we hope this helps give some sense of performance.
Here is the video for the article mentioned earlier about the BlueField-2 DPU and some fun we had with it recently.
The charts says “difference”, but you would need to subtract “100%” from each of those values for it to be the difference, instead of being the relative performance.
Oh, that was fun, thanks for the GB5 results.