Kalray K200-LP DPU with Coolidge MPPA3-80 DPU Chip at FMS 2022

8
Kalray K200 LP DPU Cover
Kalray K200-LP DPU Cover

At FMS 2022, we saw a new DPU. There were two products actually at FMS 2022 that claimed to be DPU, Kalray’s option was one, while the other company showing a DPU was showing a fake DPU. As a result, we wanted to show Kalray’s DPU solution, and why we are going to classify it as a DPU even though it uses something very different than the others in the market.

Kalray K200-LP DPU with Coolidge MPPA3-80 DPU Chip at FMS 2022

The way most of us will see the Kalray K200-LP DPU is in the -LP or low profile PCIe Gen4 x16 card.

Kalray K200 LP DPU Front
Kalray K200 LP DPU Front

Onboard is the Kalray MPPA3-80 DPU chip. We were told that this chip is running two versions of Linux simultaneously to provide DPU functionality, and that is something very different.

Kalray K200 LP DPU Cover
Kalray K200-LP DPU Cover

Here is a back side of the card where we can see more memory packages.

Kalray K200 LP DPU Rear
Kalray K200 LP DPU Rear

Here are the two QSFP28 100GbE ports.

Kalray K200 LP DPU QSFP28 Ports
Kalray K200 LP DPU QSFP28 Ports

Why the Kalray MPPA3-80 DPU is so interesting is that it is not using Arm, MIPS, or x86. Instead, it is using the company’s own Coolidge cores and has 80 of them on the chip. The cores are surrounded by caches, accelerators, and other devices. Each chip is made up of five clusters of 16 cores. Kalray has one of these 16 core clusters running the card’s management function in its own Linux environment. The company told us then that for applications, the other 64 cores are running another Linux environment. Since these are not mainline cores, the Linux distribution is custom compiled for the Kalray cores.

Kalray MPPA3 80 DPU Overview
Kalray MPPA3 80 DPU Overview

This is an interesting one because we typically have only seen Arm, MIPS, and x86 solutions in this space. Kalray has something different, so we wanted to run it through our DPU framework outlined in What is a DPU? A Data Processing Unit Quick Primer.

  • High-speed networking connectivity (usually multiple 100Gbps-200Gbps interfaces in this generation) – 2x 100GbE on the K200-LP.
  • High-speed packet processing with specific acceleration and often programmable logic (P4/ P4-like is common) – This was left to the cores, but seems to be less of a focus for Kalray since it is focused on storage.
  • A CPU core complex (often Arm or MIPS based in this generation) – This is the Coolidge core cluster with 80 cores
  • Memory controllers (commonly DDR4 but we also see HBM and DDR5 support) – The K200-LP has DDR4-3200 support and we can see the memory on the card
  • Accelerators (often for crypto or storage offload) – Each cluster has these on the MPPA3-80.
  • PCIe Gen4 lanes (run as either root or endpoints) – There is an x16 interface on the chip and card
  • Security and management features (offering a hardware root of trust as an example) – This is not the focus on the card, but it does offer the second environment for managing the infrastructure so we are giving it a pass here.
  • Runs its own OS separate from a host system (commonly Linux, but the subject of VMware Project Monterey ESXi on Arm as another example) – Here we confirmed the card is running two Linux OSs.

This seems to be a storage focused DPU, so it seems to be more focused, like the Fungible solution, on providing a storage alternative to a traditional CPU-based system. There is less of a focus on creating an infrastructure-wide solution. Still, we are going to include this in our DPU coverage going forward since it seems to be close to what we would call a DPU.

Kalray DPU Solutions

The company also showed storage solutions based on its DPU. One was the Kalray Flashbox that we think is made by Viking.

Kalray K200 LP DPUs In Viking Flashbox
Kalray K200 LP DPUs In Flashbox

This is powered by two nodes, each with four Kalray KP200-LP DPUs.

Kalray K200 LP DPU X4 In Viking Flashbox
Kalray K200 LP DPU X4 In Viking Flashbox

At the rear of the box, we can see that there are two of these four DPU controller nodes. In this picture, one of the nodes had only three DPUs.

Kalray K200 LP DPU Flashbox Rear
Kalray K200 LP DPU Flashbox Rear

Atop the hardware, Kalray has its software based on SPDK and being run and accelerated by the DPUs.

Kalray K200 LP DPU In Flashbox Storage Interface
Kalray K200 LP DPU In Flashbox Storage Interface

The other solution was from Pixitmedia, and was that company’s PixStor box.

Kalray K200 LP DPU PixStor Box
Kalray K200 LP DPU PixStor Box

PixStor uses DPUs in the company’s NVMe tier for adding higher-speed storage to the overall solution.

Kalray K200 LP DPU PixMedia Architecture
Kalray K200 LP DPU PixMedia Architecture

Where Fungible focused more on selling its own solution, Kalray seems to be also looking at more OEM opportunities.

Final Words

This is one of those really interesting solutions because it is not using an Arm or x86 CPU. On one hand, for the purpose of just doing storage, the Kalray DPU using something different may make sense. It is great to see different types of technologies in the marketplace.

Kalray K200 LP DPU Cover
Kalray K200-LP DPU Cover

At the same time, we can also see the benefit to using Arm or something that is more general purpose to a DPU can be more easily maintained and extended in the future. For flexibility, we looked at the FPGA plus Intel Xeon D IPU that JD.com is using in This Changes Networking Intel IPU Hands-on with Big Spring Canyon. We also did a hands-on with ZFS without a Server Using the NVIDIA BlueField-2 DPU:

It will be interesting to see how Kalray fares in the market with its very different DPU.

8 COMMENTS

  1. It’s highly likely that those Coolidge cores are just tweaked ARM cores.

    I don’t see anyone doing a whole new backend on open-source tool chains (gcc/llvm/gdb,binutils) without that being reflected in their source.
    They might have done some tweaks to existing backends to accomodate some special instructions that use special-purpose hardware onboard and that’s it.

  2. @Onibra

    Hi, it’s not an ARM architecture. It is a patented MPPA technology from CEA Leti (Atomic Energy Commission, Grenoble, France), one of the main microelectronics laboratories in the world, at the origin of the company STMicroelectronics among others.

  3. interesting article despite few mistakes…
    ex : there’s a single instance of linux running on one of the clusters, and it’s only dedicated to control and management plane. Other clusters are running a lightweight run-to-completion proprietary OS (called ClusterOS) with libc and minimal pthread support on top of which SPDK has been ported.
    And indeed cores are proprietary VLIW with dedicated instructions to accelerate compute intensive tasks like Erasure Coding, AI (ex CNN).. with support for GCC, LLVM … and upstreaming is in WIP 😉

  4. @Onibra

    Apparently, they do: by looking at their github they have ported binutils, gcc, llvm, linux, gdb, …
    Their gcc is even available on godbolt (under the name KVX GCC. why ?), and it really does look like a VLIW instead of ARM instruction set.

  5. @Nix
    In fact Coolidge (aka MPPA 3) is the name of the SoC, not the name of the core’s architecture.
    The architecture of the core(s) is named kv3-1 (meaning 3rd generation, version 1), and it’s of the kvx family thus the kvx name everywhere.

  6. @Onibra

    Tell me you don’t know anything about Kalray without telling me you don’t know anything about Kalray.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.