Getting the NVIDIA BlueField-2 DPU Ready
BlueField-2 DPUs have many quirks. One of them is just how many ways there are to get access. We have both 25GbE and 100GbE versions from early 2021, and all of them are set up similarly from the factory. NVIDIA is rapidly iterating on the software stack so this may change. Still, out of the box one can use serial/ USB to configure the DPUs, or use a network interface. There is an oob_net0 interface that connects to the DPU’s is the one we are after, and on our units, this defaults to dhcp4 for an address. Perfect! The challenge is that tmfifo is the interface with the interface used to talk to a host machine over the PCIe interface. For many, this is not the interface you will want to use. The gateway on this means that it will default to try going through the host machine for the network, and that is not what we want. There is an out-of-band management interface, so we want to use that. A quick trick is that you can just use netplan and comment out the gateway/ nameservers lines for tmfifo_net0 and then everything will work.
With that change, we can still ssh into the card on 192.168.10.2, but we are now basically using the OOB management port to access the DPU. You can also remove the tmfifo interface if you do not want it for security reasons.
The next step is always to update. This is Ubuntu 20.04 LTS on the card, and it uses a sh shell by default (which sucks, please give us bash as a default NVIDIA.) Docker is also pre-installed. So what we can do now is to update the DPU using
apt-get update and
apt-get upgrade. Software evolves rapidly, so this is a step I highly suggest. Please note, that this is unlikely to be a 30-second process. These are Arm Cortex A72 cores, so go grab a coffee. With our little fleet of machines, the top two reasons we have had issues updating are that the clock reset to 1970 so none of the repos were valid. We usually just use
timedatectl to set time properly.
Something we are going to quickly note is that we are using Ubuntu. NVIDIA has a CentOS 8 set of images that are basically not recommended now with IBM-Red Hat Bidding Farewell to CentOS. There are also RHEL and Debian instances. There are tools to roll your own OS and that may be easier than updating everything on these DPUs, especially if you have an older card.
The next step is that you can install ZFS using
sudo apt install zfsutils-linux and it works very easily here:
Although this is Ubuntu, since this is the Arm/ BlueField version, we need to also install zfs-dkms via
sudo apt install zfs-dkmsAs you can see here:
Now we are going to use the three 960GB add-in-card SSDs in the AIC JBOX to create a little ZFS RAID array.
We are going to take our 3x 960GB SSDs and make sthbf2pool (yes that is STH BlueField-2 pool) using zpool create raidz. Some may pick up on the little extra credit work that is going on here. Again, this is being done to show a concept, not as a production array so we are keeping this simple.
So at this point, our BlueField-2 DPU has an array of three SSDs that it has a RAID-Z array on. It is also not using the host x86 system here, this is all being done on the DPU’s Arm cores. This is a ZFS RAID array being run completely on BlueField.