Ventana has a new RISC-V data center processor design. The Ventana Veyron V2 is a new processor aimed at intercepting a fundamental shift in data center compute, which makes it interesting. A few months ago, we covered the Ventana Veyron V1 RISC-V data center processor at Hot Chips 2023. I was able to speak with Balaji Baktha (Founder and CEO of Ventana) and his team about the V1 and the vision for the new V2 processor.
What Happened to Ventana Veyron V1?
The Veyron V1 seemed to offer solid performance and many of the features one would want in the data center with better performance per watt than much of the industry. Perhaps the first question is, where is the Veyron V1?
After speaking with the team, my gut feeling is that Ventana felt like it needed another revision before really turning its chips into a commercial product. The interesting part was not necessarily a lot of focus on the V1 core challenges. Instead, our discussion centered around a lot of the platform and enablement that goes around a server processor. As an example, it has been about 20 months since the NVIDIA GTC 2022 Keynote where NVIDIA launched its Arm-based Grace CPUs. Those CPUs were running pre-production silicon in the lab about 12 months ago and are just starting to make it into shipping systems today. That is built on Arm, a relatively mature data center ecosystem compared to RISC-V. In the past year or so, RISC-V’s maturity has been rapidly increasing, and my sense is that Ventana is trying to time its chip launch with when it can land volume deals with hyper-scalers.
While many people assume that a company can design a chip, send it to TSMC or another foundry, have it cut and packaged, and then everything works, that is unrealistic. There is a ton of platform work that has to be done on any new chip design. My sense after the conversation was that Ventana learned a few things by showing V1 to customers, the platform maturity increased, but it was time for a V2 as the go-to-market part.
We are going to discuss some of those reasons for a generational skip as we go through Veyron V2.
Ventana Veyron V2 RISC-V CPU Launched for the DSA Future
Let us pivot to the Ventana Veyron V2. Ventana uses a chiplet approach with an IO hub and accelerators coupled with UCIe to achieve 192 cores per socket. While its performance per core may not reach Zen 4c levels, it is focusing on UCIe and domain-specific acceleration (DSA) to provide a more modern computing platform.
Two clear examples of what skipping the productization of V1 does is giving V2 RVA23 (RISC-V vector extensions) and a UCIe chiplet interface. Instead of building its own vector extensions, Ventana is putting a lot of focus on trying to make Veyron V2 a standards-based chip.
Veyron V2 also increases the performance per core while giving 32 cores and up to 128MB of L3 cache. It also implements AMBA CHI which is something we see on many Arm CPUs as an example.
Getting into the caches, here is the slide with 512KB I cache, 128KB D-cache, and a 1MB L2 D-cache. Unlike some Intel and Arm designs we have seen in recent years, there is still a L3 cache.
Each cluster of 32 cores is built on a chiplet that is then connected to an I/O hub. With six of these chiplets, Ventana can get to 192 cores.
Another big feature is RAS, having ECC capabilities, data poisoning, and so forth.
These days, a data center processor needs to have Secure Boot and Authentication. A chiplet CPU also needs to do chiplet authentication.
Now, let us get into the magic, the chiplet approach. Ventana is using UCIe to connect to an I/O hub that has DDR and PCIe controllers. UCIe is going to be a force in the industry, and this diagram should help explain why. We do not see Ventana with CPU core only compute chiplets. Instead, these slides all show domain-specific acceleration chiplets as well.
There are other features that will come into this version such as IOMMU which is important on modern CPUs. Something that we probably take for granted is that as all of these new features are introduced over the years, not just CPUs need to be designed for them, but there is then an entire systems enablement effort to ensure that features like the IOMMU operation works with other hardware. Longtime readers may remember our Dell PowerEdge R7415 NVMe Hot Swap AMD EPYC in Action piece because it took months for hot swap NVMe, originally built on Intel Xeon, to work on AMD EPYC. We think of that as a foundational feature today, but in modern CPUs, there is a lot going on.
Ventana is also supporting RISE. RISC-V is sometimes compared to the wild west since, in theory, one can do just about anything with CPUs. Ventana is a RISC-V design, but it wants to be a standards-based one for compatibility, so RISE support is important.
To be fair, Ventana can have a good RISC-V part, but being 5% faster than an alternative in a generation is not going to get companies to switch architectures. Instead, the company is banking on the idea of integrating acceleration chiplets (likely UCIe based) into I/O hubs as a central part of its strategy giving its parts a different performance curve. For example in a storage server, crypto and compression may be important to integrate. In a CDN server, perhaps that is a transcoding accelerator. The idea is that integrating these accelerators changes the curve. This is already common industry practice, just with the accelerators on the PCIe bus for AMD and companies like Ampere and Intel integrating directly into the CPU.
At this point, the idea of 32 core UCIe enabled chiplets with room to add accelerator chiplets on package might sound exciting, but there is more.
Hyper-scalers want to add custom acceleration directly to the I/O hub. Ventana is already pitching its chiplets for use on customer-designed I/O hubs.
The chiplet side is really interesting because it allows for chips to be made more quickly. A FPGA can be added, and then later an ASIC accelerator. That adds flexibility, but it also lowers the barrier to entry since it allows packages to be constructed using smaller IP blocks that use UCIe and an I/O hub.
Ventana’s goal is to have its customer designs use these DSA chiplets, whether FPGA or ASIC to deliver better workload efficiency, rather than just the maximum SPECint throughput.
Here is a great example of some of the DSA blocks. A really good one here is the infrastructure offloads.
At the end of Ventana’s slides, there was a server. This appears to be a Gigabyte platform that will be a single socket 192 core 1U server with 12-channel DDR5-5600. Do not take too much stock in the actual picture as it seems to be an Intel Skylake/ Cascade Lake server that has a little bit of Photoshop magic around the CPU socket area.
We asked about availability, and our sense is that we will not have one in the STH lab by Q1 2024 but hopefully later in 2024.
This is really something different. If you look at what Ampere and previously Marvell/ Cavium tried to do in the server space, it was to compete directly with Intel on general-purpose compute platforms where one of the big optimizations was lowering floating point throughput on a superior TSMC process node to get better integer performance per watt. That is really the magic of Arm servers today. Ventana is doing something different looking to be the CPU core for the UCIe era where it expects the market to move to acceleration. That feels a bit more like the Annapurna Labs / AWS model rather than what some of the well-known Arm players have been doing.
Now the challenge I posed to the Ventana team is when do we get one? To me, it feels more real when I can boot Ubuntu on a server and get going. I am hoping the answer is 2024.