Modern CPUs have a glaring problem and AMD is taking its first steps to remedy the issue. The AMD EPYC Bergamo is the company’s newest chip that offers up to 128 cores in a single chip. Unlike the modern trend of making bigger, faster CPU cores, AMD is making something that is deliberately slower (on a per-core basis) but also has new characteristics that we had not seen before. This is the start of x86 CPUs marching into the cloud-native compute realm formerly dominated by Arm CPUs. In this piece, we are going to get into it in-depth.
“What is the fastest server CPU?” It is Complicated
Five years ago, the answer to “what is the fastest public server CPU on the market” was still basically a hunt for the highest MSRP chip Intel sold. Times are changing. Instead, that question really has become a lot more workload specific. AMD now has three workload-specific EPYC processors just in the AMD Socket SP5 platform.
In this article, we are going to focus on Bergamo, the high-core count, but lower clock speed and lower cache variant. We already went in-depth on the Genoa mainstream part that you can find in AMD EPYC Genoa Gaps Intel Xeon in Stunning Fashion. We now have a video for this article as well covering the new chips and how they all create a portfolio of chips for AMD’s server business:
We are also going to briefly cover “Genoa-X” as we covered it in the video that is accompanying this article. We will separately do a deep dive on that and another deep dive on the Intel Xeon Max series with 64GB of HBM2e onboard that we have tested and the video mostly recorded for.
Genoa-X adds 3D V-Cache similar to Milan-X and the desktop parts, to get up to 1.1GB of L3 cache per CPU. For some perspective, we had a Twitter comment exclaim that 1.1GB is enough to meet the minimum Windows 7 32-bit system requirements.
Since we have already gone into more depth on that technology, we are going to focus this article on the new Zen 4c underpinnings, and how AMD EPYC Bergamo fits into the portfolio.
The AMD EPYC Bergamo Recap and Something New
We went over this previously in AMD EPYC Bergamo Launch SKUs, but for completeness, we wanted to do a quick recap and show our readers something that AMD did not say, but that we found on three different systems we tested with three different chips. First, the new series is designed with either 112 or 128 cores.
AMD was able to shrink each CCD by adopting a new Zen 4c core. AMD has less L3 cache (half at 2MB versus 4MB) with Zen 4c, and that allowed the company to shrink the area in a straightforward manner. AMD was also able to optimize things like trace lengths and so forth around the more compact die. As a result, each core’s die area on Zen 4c is much less than that of Zen 4.
One of the big selling points for AMD is that this is a L3 cache reduction, not a feature reduction. Intel is planning a move to E-cores from P-cores for Intel Sierra Forest. At the same time, this is AMD’s x86 Zen 4 instruction set and not some stripped-down set of instructions and features like many Arm offerings we have seen to date.
AMD is also showing its chiplet strategy here. We get the same I/O die with PCIe Gen5 and DDR5 that we have seen on Genoa (and Genoa-X.) That means this is a known quantity and helps a lot with platform validation. The DDR5 and PCIe Gen5 controllers are not changing. In this case, the big change is transitioning to the 12x 8-core CCDs on 96-core Genoa to 8x 16-core CCDs on Bergamo.
That also means that this is still an AMD Socket SP5 CPU, so we were able to use it in a number of servers and motherboards we had in the lab.
In terms of the SKU stack, the AMD EPYC 9754 is the flagship part with 128 cores and 256 threads at $11,900. That is likely the best CPU if you use increments of two vCPUs in your VMs. For those who do not want or do not need SMT, there is the 128-core and 128-thread AMD EPYC 9754S at $10,200. The AMD EPYC 9734 is the 112-core part at $9,600. AMD does not have SKUs below the 112-core figure because that starts to get into the 96-core Genoa line.
On the subject of key findings, this is a great screenshot. One of the messages against traditional x86 designs is that the cores vary widely in how they are able to turbo boost. Some cores will be low frequencies while subgroups burst higher.
To generate that 100% load across all 256 threads, we used stress-ng. Here is the shocking screenshot:
The specs say 3.1GHz. Unlike many other server CPUs, all 128 cores sat for hours running at 3.1GHz. This was even in a less-than-stellar cooling server where the temps were in the 75C range, yet all 256 threads were loaded and all 128 cores sat at 3.1GHz.
That commitment to a low maximum frequency actually helps ensure that there are not cores outpacing others, an important trait of cloud-native processors. What is more, this cloud-native processor is not a stripped-down core. It has AMD’s Zen 4-era ISA with support for AVX-512 and things like bfloat16 and VNNI for AI inference.
Put together, the two big trade-offs one is making with Bergamo are:
- Half the L3 cache
- Lower maximum clock speed
Otherwise, this is a drop-in high-core count replacement in AMD’s portfolio.
Next, let us get to performance.