One of the biggest topics coming to the server industry in 2022 will be Compute Express Link or CXL. We started covering CXL when it was first announced in 2019. The project went from an Intel effort to a broad industry consortium and is simply where the industry will head as we enter the 2022 PCIe Gen5 era. Still, there are many who do not necessarily understand CXL, so we are going to try explaining CXL using an easy visual example: limes.
A Video Version
Since we are going to be talking about CXL for years to come, we have a video version of this piece that you can find here.
As always, we suggest opening this in a YouTube tab/ window for a better viewing experience.
The Quick: What is CXL? Answer
Compute Express Link is a cache-coherent link meant to help systems, especially those with accelerators, operate more efficiently. CXL sits atop the PCIe Gen5 link infrastructure. There can be PCIe Gen5 devices, but many PCIe Gen5 (or PCIe 5.0) devices will also support the ability to run lanes in either PCIe or CXL modes. In some ways, this is analogous to how AMD EPYC CPUs can run their I/O lanes as either PCIe or as Infinity Fabric. Instead of being a vendor-specific implementation, this is going to be a broad industry standard. The key advantage of CXL is it allows load/ stores to happen to memory that is directly attached to different endpoints which is what we are going to get into next.
We are going to focus a lot on memory here. Memory is a key driver for this as hyper-scalers realized that almost every device these days has onboard memory, and memory both is a large cost driver and also is often stranded by being under-utilized by the device it is attached to. At a higher level, CXL is a tool for systems to efficiently share and utilize memory across a number of devices. We are discussing these in the context of servers today, but there is no reason it cannot be applied beyond servers.
Why is CXL not used in the PCIe Gen3/ Gen4 era?
There are a few reasons that we are waiting until PCIe 5.0 for CXL. The first is simply timing. CXL was going from introduction to adoption in 2019 which means late 2021 and early 2022 are really the first generations of chips that we expect to see support in. These things take time.
A key enabler though is the rapid adoption of PCIe Gen5. It took the industry around seven years to transition from PCIe Gen3 to Gen4. The PCIe Gen5 transition is occurring about 3 years after Gen4. This massive increase in speed effectively doubles the bandwidth of PCIe Gen4 with Gen5. Since CXL sits atop PCIe Gen5 we generally see a controller with a 32GT/s x16 link but CXL can also support bifurcation to x8 and x4. There are modes to go into x2 and x1 links in what is called “Degraded” mode. It is important to remember that a PCIe Gen5 x4 link has enough bandwidth to handle a 100GbE link so that starting level represents a large amount of bandwidth that was akin to a PCIe Gen3 x16 link that Intel still used as a top-end connectivity option in Q1 2021.
The additional bandwidth means that PCIe will have enough bandwidth to handle a new class of use cases, specifically for those involving remote memory load/stores that previous generations were not fast enough to handle.
For those wondering why did this not happen sooner, there are three main reasons. First, we needed the speed of PCIe Gen5. Second, we needed CXL to be created. Third, it then takes time for industry adoption to occur. Now that we have all three, CXL is the future.
Compute Express Link Protocol Trifecta
CXL uses three main protocols:
- CXL.io is the protocol used for initialization, link-up, device discovery and enumeration, and register access. It provides a non-coherent load/store interface for I/O devices and is similar to PCIe Gen5. It also is mandatory that CXL devices support CXL.io.
- CXL.cache is the protocol that defines interactions between a Host (usually a CPU) and Device (such as a CXL memory module or accelerator. This allows attached CXL Devices to cache Host memory with low latency. Think of this as a GPU directly caching data stored in the CPU’s memory.
- CXL.memory / CXL.mem is the protocol that provides a Host processor (usually a CPU) with direct access to Device-attached memory using load/ store commands. Think of this as the CPU using a dedicated storage-class memory Device or using the memory found on a GPU/ accelerator Device.
CXL.io is mandatory to get an endpoint on CXL, but from there we can have CXL.io and any of the three combinations of CXL.cache and/or CXL.mem. The CXL folks use Type 1 (CXL.io + CXL.cache), Type 2 (all three), and Type 3 (CXL.io + CXL.mem) as examples.
Type 1 one can think of as an accelerator like a NIC that accesses the host CPU’s memory directly. Type 2 is a case where we have an accelerator, like a GPU with memory, and the GPU can access host memory while the host CPU can access GPU memory. Type 3 one can think of as a memory module where its primary purpose is to expose the CXL memory device’s media to the host CPU.
CXL 1.1 v. 2.0
Most of the early 2022 devices we will see will utilize CXL 1.1. These are still primarily host CPU managed topologies where CXL is used within a system. CXL 2.0 is where we get really interesting use cases. With CXL 2.0 we get CXL switching. CXL switching and pooling allows multiple hosts and multiple devices to be connected to a switch and then devices can be assigned as either complete or as a logical device to different hosts. CXL 2.0 is where we will start to see the game-changing deployment scenarios.
Beyond CXL 2.0 switching and pooling, we get additional security layers to the protocol.
While CXL 1.0/ 1.1 will be big to start the initiative, CXL 2.0 is where we start to see truly game-changing functionality shifts. This is primarily because we get to start changing how server architectures and deployments happen. CXL 1.1 is necessary, and there are use cases especially with accelerators and GPUs. CXL 2.0 though is where hopefully we have enough software maturity that server designs change forever.
Next, we are going to discuss some examples and use our tacos and limes to help those who have difficulty visualizing on slides some common types of CXL usage.