ZFS without a Server Using the NVIDIA BlueField-2 DPU

12

The Magic – This is Actually a Multi-DPU ZFS and iSCSI Demo

Now for the more exciting part. This may be the mental model you have of the setup, but that would be incorrect.

NVIDIA BlueField 2 With ZFS Running On AIC JBOX Arm Cores Example
NVIDIA BlueField 2 With ZFS Running On AIC JBOX Arm Cores Example

You will notice that the drives are showing up in the system as /dev/sda, /dev/sdb, and /dev/sdc. Normally a NVMe SSD, like the ones we are using would be labeled as “nvme0n1” or something like that instead of “sda”. That is because there is something even more relevant to the DPU model going on here: there are actually two DPUs being used.

NVIDIA BlueField 2 DPU Mount ISCSI Volumes
NVIDIA BlueField 2 DPU Mount ISCSI Volumes

The first DPU is exporting the three 960GB NVMe SSDs as iSCSI targets from the AIC JBOX. The second DPU that we have been showing is in the host AIC server, and is serving as the iSCSI initiator and then handling ZFS duties. This is actually what the setup we are using looks like:

NVIDIA BlueField 2 With ZFS Running In AIC Server To DPU In AIC JBOX With ISCSI In OOB Example
NVIDIA BlueField 2 With ZFS Running In AIC Server To DPU In AIC JBOX With ISCSI In OOB Example

What we essentially have is two nodes doing iSCSI and ZFS without any x86 server involvement other than simply the second DPU getting power from the PCIe slot and cooling. Performance is pretty darn poor in this example because this is running totally on the out-of-band 1GbE management ports and not using the accelerators onboard the DPU.

NVIDIA BlueField 2 DPU ISCSI To BF2 DPU ZFS Performance On 1GbE OOB NICs Only
NVIDIA BlueField 2 DPU ISCSI To BF2 DPU ZFS Performance On 1GbE OOB NICs Only

Still, it works. That is exactly the point. Without any x86 server in the path, we manage to mount SSDs on DPU1 in the AIC JBOX. We then use DPU1 to expose that storage as block devices to the network. Multiple block devices, exposed by DPU1, are then picked up by DPU2, and a ZFS RAID array is created. This ZFS RAID array is then able to provide storage. Again, this is network storage without a traditional server on either end.

Final Words

The real purpose of this piece is to show the concepts behind next-generation infrastructure with DPUs. We purposefully stayed off the accelerators and the 100GbE networking in the NVIDIA BlueField-2 to show the stark contrast of DPU infrastructure using concepts like iSCSI and ZFS that have been around for a long time and people generally understand well. With erasure coding offloads and more built-into next-generation DPUs, many of the older concepts of having RAID controllers locally are going to go away as the network storage model offloads that functionality. Still, the goal was to show folks DPUs in a way that is highly accessible.

Ultimate Mini Cluster In A Box 2021 7x NVIDIA BlueField 2 DPUs 2
Ultimate Mini Cluster In A Box 2021 7x NVIDIA BlueField 2 DPUs 2

The next step for this series is showing DPUs doing NVMeoF storage using the accelerators and offloads as well as the 100GbE ConnectX-6 network ports. The reason we used this ZFS and iSCSI setup is that it showed many of the same concepts, just without going through the higher performance software and hardware. We needed something to bridge the model folks are accustomed to, and the next-generation architecture and this was the idea. For the next version, we will have this:

NVIDIA BlueField 2 DPU NVMeoF Example Using BF2 AIC JBOX Target
NVIDIA BlueField 2 DPU NVMeoF Example Using BF2 AIC JBOX Target

Not only are we looking at NVIDIA’s solution, but we are going to take a look at solutions across the industry.

12 COMMENTS

  1. You’ve sold me on DPU for some network tasks, Patrick.

    This article… two DPUs to export three SSDs. And, don’t look over there, the host server is just, uh, HOSTING. Nothing to see there.

    How much does a DPU cost again?

    Jen Sen should just bite the bullet and package these things as… servers. Or y’all should stop saying they do not require servers.

  2. This is amazing. I’m one of those people that just thought of a DPU as a high-end NIC. It’s a contorted demo, but it’s relatable. I’d say you’ve done a great service

  3. Okay, that was fun. Does nVidia have tools available to make their DPUs export storage to host servers via NVMe (or whatever) over PCIe? Could you, say, attach Ceph RDB volumes on the Bluefield and have them appear as hardware storage devices to host machines?

    Chelsio is obviously touting something similar with their new boards.

  4. SNAP is the tools and there’s DOCA. Don’t forget that this is only talkin’ about the storage, not all the firewalls that are running on this and it’s not using VMware Monty either.

  5. Scott: Yup! one of the BlueField-2’s primary use cases is “NVMEoF storage target” – there are a couple of SKUs explicitly designed for that purpose, which don’t need a PCIe switch or any weird fiddly reconfiguration to directly connect to an endpoint device (though it’d be a bit silly to only attach 4 PCIe 4.0 x4 SSDs to it!)

  6. Scott: Ah, just realized I misread – but yes, that’s one of the *other* primary use cases for BF2, taking remote NVMEoF / other block devices and presenting them to the host system as a local NVMe device. This is how AWS’s Nitro storage card works, they’re just using a custom card/SoC for it (the 2x25G Nitro network and storage cards use the same SoC as the the MikroTik CCR2004-1G-2XS-PCIe, so that could be fun…)

  7. I think that mikrotik uses an annapurna chip but its more like the ones they use for disk storage not the full Nitro cards being used today. It doesn’t have enough bandwidth for a new Nitro card.

  8. Hope we will arrive at two DPUs both seeing the same SSDs. Then ZFS failover is possible and you could export ZFS block devices however you like or do an NFS export. Endless possibilities like, well, a server The real power of a DPU only comes into play when the offload funktionalitet comes into play.

  9. Did you need to set any additional mlxconfig options to get this too work?
    Any other tweaks required to get the BlueField to recognize the additional NVMe devices?

  10. Thanks for the good article. I like the computer equipment of this company. It stands out for its reliability. If possible, write a few more articles about it. I think it would be interesting to read.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.