For those that follow STH, I have been working on a concentrated line of DPU coverage for several months. If you are not yet familiar with the term, see our What is a DPU or data processing unit? piece. One of the promising areas for DPUs to disrupt the status quo is in storage. Today, Fungible embarks on that journey with their DPU-based storage cluster. Since this is STH, we are going to cover the news, but we got to see the solution hands-on last week and have some video to share as well. Let us take a look at the Fungible DPU-based storage cluster and the FS1600 DPU-storage node to see why this is something to get excited about.
Video Hands-on Look
This article has something a bit different. Normally, our web and video content match fairly closely. For this piece, we have a bit more on the video side:
In this video, we talk about the technology, but then also have an interview with Fungible’s CEO Pradeep Sindhu (formerly co-founder and CTO at Juniper Networks), as well as a product overview, UI look, and performance demonstration from Benny Siman-Tov Fungible’s VP of products. This is one of the rare occasions on STH where the video will have some content the web version does not have. As always, we suggest opening the video on YouTube and listening along.
Fungible DPU and TrueFabric Base
We are not going to go into too much detail here. One of Fungible’s key differentiators is the Fungible DPU. These DPUs utilize data clusters with MIPS cores, high-speed memory interfaces, accelerators, a PCIe root complex, and high-end networking to effectively move data faster and more efficiently than with traditional general-purpose CPUs such as Intel Xeon or AMD EPYC.
Since the folks at Fungible have a lot of networking background, not only are they focused on the endpoint processors, but they also have a data fabric called TrueFabric that is designed to handle the scaling and QoS aspects of scaling applications such as storage across DPUs.
With those base building blocks, today Fungible has its new storage cluster and DPU-based storage nodes to deliver a high-performance storage solution.
Fungible DPU-based Storage Cluster
In 2020, a single node or simple HA pair storage cluster seems less exciting. Today, efforts are focused on scale-out storage and that is where the Fungible Storage Cluster solution is designed to play. Powered by the Fungible DPU and TrueFabric, the company is solving for traditional challenges using a non-traditional architecture.
The basics of the Fungible storage cluster come down to three main components. First, there is a control plane run by the Fungible Composer nodes. Scaling out data storage involves adding more Fungible FS1600 DPU nodes which we are going to discuss later in this article and show a few hands-on photos. The third component is that the solution uses NVMeoF to connect to client servers making it relatively easy to integrate.
The Fungible composer runs all of the management and control planes for the solution. This is important because this abstraction of the data and management/control planes allows the solution to scale out more efficiently. While the Fungible Composer nodes we are told are more of standard servers since these are tasks where traditional models work well, the FS1600 data plane nodes are the ones with two DPUs each.
In terms of performance, here is what Fungible is claiming at a high-level:
We got some hands-on time with the solution. Here is a view of a FS1600 hitting over 10M IOPS and over 39GB/s using only five server nodes as clients. One can get to higher performance using more servers making requests on the system.
Fungible has a number of performance claims. One of the more interesting is faster database performance and lower latency than traditional direct-attached storage. Here is the MySQL slide but Fungible has Cassandra numbers it is publishing as well.
One of the key partnerships Fungible is disclosing is with IBM Spectrum Scale. With a 6-node Fungible cluster, one can get to 30M GPFSperf IOPS.
Having partnerships with companies such as IBM are important for solutions like what Fungible has here. In a stratification of storage startups, these deals with large companies help sales and marketing efforts which in turn give even well-funded startups like Fungible a path to sales. On a relative scale, having good database performance is good to show the viability of the solution, but having large sales and distribution partnerships with existing channels such as IBM are harder to get and more important.
While the cluster is cool, we wanted to show off the actual DPU-based nodes themselves. As a result, we headed over to the Fungible HQ to see the Fungible FS1600 hands-on. That is also where we got the UI and fio performance demonstration screenshot above.
Fungible FS1600 Hands-on with a DPU Node
In a former MIPS Silicon Valley building, we found the Fungible headquarters. During our time there, we were able to see the Fungible FS1600 DPU-based scale-out storage node hands-on. We also have more on this and an interview with the company’s founder in the video linked above.
A Fungible FS1600 storage node is capable of 15M IOPS and up to 60GB/s of performance. Please note, that is in Gigabytes not Gigabits per second.
The FS1600, at some level, looks like a standard 24-bay 2.5″ NVMe enclosure we review regularly at STH. Normally, we get excited over developments such as when Dell and AMD Showcase Future of Servers 160 PCIe Lane Design in a traditional server. Here, Fungible is doing something completely different than one may expect coming from a traditional server background.
The system itself has two Fungible DPUs on a single PCB. This is not a dual-socket system in the traditional sense. Instead, each DPU works independently with its own memory, NVDIMM-N, 6x 100GbE links, and 12x of the front panel NVMe SSDs. There are a few shared features such as the ASPEED AST2500 baseboard management controller (or BMC) along with an Intel-Altera FPGA for features such as status LED blinking and enclosure management. The other major shared features are the power and cooling since there are two power supplies to provide redundancy and five chassis fans for cooling.
The two DPUs effectively run as independent nodes for purposes of running the scale-out data plane. Each DPU is specifically designed for scale-out storage and to accelerate storage and retrieval of data on the NVMe SSDs. If you want to learn more about the Fungible F1 DPU features and architecture, see our Fungible F1 DPU article.
Here is another look at the system there is a COM card with a traditional embedded node in the center of the system. We are using a revision 2 chassis for this hands-on so this card that was mostly for bring-up and development is present. In future generations, we were told these will be removed.
A few quick notes here. First, the DPU heatsinks are large. After seeing this unit I looked at an AMD EPYC heatsink and fan say they are the same order of magnitude in size. These are not simple low-end PCIe NIC heatsinks. Instead, the correct way to view the Fungible DPU is that it is a combination of a specialized processor, accelerators, and connectivity. One will notice that there are no standard PCIe slots in this system which means there are no PCIe NICs or storage controllers. The three main serviceable types of components are the
- 24x NVMe SSDs
- 2x Boot storage modules
- 2x Power supplies
- DDR4 RDIMMs
- NVDIMM-N’s with their energy storage modules (Agigatech in this system)
The level of integration with the DPU versus a traditional Arm or Intel/ AMD server is much higher.
On the official Fungible documentation, we see there are 6x 100GbE ports on the back of the system per DPU. In this revision, there are actually ten 100Gbps ports per DPU for 20 total. This view almost looks like something we would expect to see from an early 100Gbps switch instead of on a storage appliance. That is perhaps the point though. Traditional servers struggle with the memory bandwidth, and certainly the inter-socket bandwidth constraints of high-end NVMe storage solutions and high-end networking. We have features such as RDMA/ RoCE to help with this, but there are fundamental bandwidth limitations we run into. The DPU is designed for this application so it can design for these bandwidth challenges. In addition, accelerators for networking as well as encryption/ compression can be added with line-rate performance targets.
Fungible offers a number of different options focused on storage density. One can get higher capacity drives with less performance per GB or smaller capacity drives in the system with higher performance per GB.
Again, if you want to hear it from Fungible, in the video linked above we have Benny Siman-Tov, the VP of product for Fungible going over the solution.
Next-Step Client TrueFabric with Host-Side DPUs
What we did not get to cover is the host-side DPU. If you remember the TrueFabric discussion above, Fungible’s vision extends beyond the storage solution running on standard 100GbE networks. Fungible also envisions tightly controlling features such as quality of service using TrueFabric.
We somewhat knew this was coming since at the company’s Hot Chips 32 presentation we saw that there would be the Fungible S1 DPU.
Having a DPU on the host side also allows for features such as turning the x86 or Arm serve as a bare-metal resource that can be provisioned as an entire server. The host-side DPU can present itself to the server as a network interface and a NVMe device, and then TrueFabric and the Fungible Storage solution can orchestrate storage and parts of the provisioning process. The industry is moving toward a model with these host cards after AWS Nitro paved the way, so for cloud providers that need a similar infrastructure, Fungible will have a DPU-based solution.
First, I wanted to thank the Fungible team for arranging this. It was great to not only see the slideware but to also see a system in-person.
A few items truly stood out from the hardware side. The DPU heatsinks were very large. It is right to think of these as in-line with modern CPUs or GPUs in terms of what they are accomplishing. Perhaps the most shocking change was seeing a system that looks like a server, and normally we would expect to see a standard server in, yet it has no PCIe expansion. That was a different experience. Designing something different is a means, but the end is delivering high-performance at scale which we saw from the performance demo.
Stay tuned for more Fungible and DPU coverage on STH. Our next goal is to see if we can look at the Fungible S1 host-side solution.