Recently, there has been a lot of confusion in the industry around what is a DPU versus a SmartNIC, or data processing unit. One of the key challenges here is that marketing organizations are chasing buzzwords and in some cases avoiding buzzwords which makes comparisons difficult. We are introducing the STH NIC Continuum in its first draft Q2 2021 edition to show how we are going to be classifying NICs at STH. We do a large number of NIC, server, and switch reviews in the industry, so we simply need a framework to discuss types of NICs, and that is what we have today.
What is a DPU?
Last year, we had a piece What is a DPU A Data Processing Unit Quick Primer. That had an accompanying video:
In that piece, we discussed some of the key characteristics that DPUs share. Among them are:
- High-speed networking connectivity (usually multiple 100Gbps-200Gbps interfaces in this generation)
- High-speed packet processing with specific acceleration and often programmable logic (P4/ P4-like is common)
- A CPU core complex (often Arm or MIPS based in this generation)
- Memory controllers (commonly DDR4 but we also see HBM and DDR5 support)
- Accelerators (often for crypto or storage offload)
- PCIe Gen4 lanes (run as either root or endpoints)
- Security and management features (offering a hardware root of trust as an example)
- Runs its own OS separate from a host system (commonly Linux, but the subject of VMware Project Monterey ESXi on Arm as another example)
The common theme though is that a DPU is designed for disaggregating the infrastructure and application resources in the data center. The DPU is designed to be an infrastructure endpoint that both exposes network services to a server and to devices and at the same time securely exposes the server and device capabilities to the broader infrastructure.
With that framing, now we are going to introduce a framework for discussing SmartNIC versus DPU since that is an area of confusion.
SmartNIC vs DPU
One of the biggest questions we are asked, and see vendors struggle with is how to classify SmartNIC vs DPU vs FPGA-based solutions. As such we have put together a draft of what we see as the common points going across the industry.
With that, here is the first 2021 STH NIC Continuum. This is something we expect to update over time, but the goal is to provide some framing on the question of what is an offload NIC vs SmartNIC vs DPU, and what constitutes a more exotic solution, usually based on FPGAs. Many vendors use these terms interchangeably, so we needed a structure to discuss solutions on STH.
Starting with the Foundational NIC, this is really the basic level of a network interface today. Almost all modern NICs have some very basic offloads such as IPv4/ IPv6 and TCP/UDP checksum offloads, but Foundational NICs are designed to enable low-cost network ports forgoing many of the higher-end offload features that add to cost and complexity.
Indeed, Foundational NICs are still a very important piece of the industry, however, at higher speeds as data rates rise and processing data flow requires more compute, most of the NICs we see are more Offload NICs.
Offload NICs are generally found in families that support 100Gbps and faster networking. Some of these families also incorporate lower-speed ports but network adapters are often designed in generations, and the 100GbE generation created a clear need for a new level of offload. At those data rates, it becomes critically important to have the NIC handle networking functions in hardware and independent of the CPU. That includes some virtualization functions being offloaded to the NIC as well.
The goal of the Offload NIC is to free the CPU from network processing as much as possible so that more CPU resources are available for running applications. There may be some limited programmability, but to a lesser extent than SmartNICs.
Getting to the SmartNIC vs DPU discussion, the key innovation with SmartNICs over offload NICs is adding a more flexible programmable pipeline, which is something that DPUs incorporate as well. Given the confusion in the market and the fact that the “SmartNIC” term was used well before the “DPU” term was adopted by the industry, there is a lot of confusion. We looked over the traditional SmartNIC and DPU materials, and there became quite a clear change in the conceptual model. SmartNICs we are thus defining as NICs that have programmable pipelines to further enhance the offload capabilities from the host CPU.
In other words, although many may run Linux and have their own CPU cores, the function of a SmartNIC is to alleviate the burden from the host CPU as part of the overall server. In that role, SmartNICs differ from DPUs as DPUs seem to be more focused on being independent infrastructure endpoints.
When we surveyed what is being called a “DPU” today, offload and programmability are certainly key capabilities. The big difference was that vendors are designing the DPU in the spirit of the AWS Nitro platform to be infrastructure endpoints. Those infrastructure endpoints may attach storage to the network directly (e.g. with the Fungible products) those endpoints may be a secure onramp to the network (e.g. with the Pensando DSC products/ Marvell Octeon products) or they may be more of general-purpose endpoints to deliver compute, network, and storage securely to and from the overall infrastructure.
This may seem like it is a nuanced approach, but when we looked at what is in the market, there is a clear focus on products designed to be higher-end offload (SmartNIC) and independent network endpoints delivering services (DPUs.) With that, some of the confusion comes from the higher-end products marketing themselves as SmartNICs or DPUs, but we think they should be their own category that we are calling “Exotic.”
The category we are currently calling Exotic NICs are solutions that generally have enormous flexibility. Often, that flexibility is enabled by utilizing large FPGAs. With FPGAs, organizations can create their own custom pipelines for low latency networking and even applications such as AI inferencing being part of the solution without needing to utilize the host CPU.
Generally, though, there is a major difference between the SmartNIC/ DPU and the Exotic NIC. That flexibility and programmability mean that those organizations deploying Exotic NICs will have teams dedicated to extracting value from the NIC through programming new logic for the FPGA. With flexibility comes responsibility and that is why these solutions need to be categorized outside of the traditional SmartNIC and DPU categories. In many domains, solutions categorized as exotic can yield impressive results, but also carry additional design and maintenance considerations that make them attractive to high-end applications.
The STH NIC Continuum is not perfect, but we need a way to categorize solutions so that we can evaluate and present them to our readers. In the SmartNIC vs DPU video above, we go into some of the key DPU players, and also some “honorable mentions” for FPGA solutions as a way to discuss the current state of the market. Within the DPU family, we see a lot of work is being done to optimize for specific use cases so we wanted to go into what some of the solutions are targeting as the market matures.
On STH, we are starting to do more DPU content, including simply A Quick Look at Logging Into a Mellanox NVIDIA BlueField-2 DPU. This is an emerging class of device and capabilities for the data center so we want to keep adding content to help our readers understand and evaluate offerings in this realm.
Of course, if you have feedback on the first draft of the 2021 STH NIC Continuum, feel free to leave that either here or in the comments on the video. The goal is to help our readers categorize and then evaluate SmartNIC vs DPU vs Exotic (FPGA) solutions and the industry is currently using terms without defining what they mean. Since we cover and review many of these solutions, we need to get some alignment, so we have the STH Continuum so that way if something is labeled a “SmartNIC” but it is really more of what we would consider an Exotic solution, we can demonstrate the differences.