AMD EPYC Bergamo Launched 128 Cores Per Socket and 1024 Threads Per 1U

2
AMD EPYC Bergamo Cover With Dr Lisa Su Large
AMD EPYC Bergamo Cover With Dr Lisa Su Large

Today AMD launched a new series of processors, the AMD EPYC Bergamo. In many ways, this is something new, with up to 128 cores per socket and 256 threads. Given the popularity of 2U 4 node chassis on the market, this will make a 2U4N platform a 2048 thread machine or one with 1024 threads per U. AMD’s approach is straightforward. It is trading cache size for core density in this generation. This is one of AMD’s biggest launches ever.

We are doing these live during AMD’s event, so please excuse typos here.

AMD EPYC Bergamo Launched 128 Cores Per Socket

Setting the stage, the AMD EPYC “Bergamo” or the EPYC 97×4 series uses a core called “Zen 4c”. The simple idea of this core is that AMD is reducing the L3 cache so it can fit more cores onto each CCD. AMD says its Zen 4c core is 35% smaller than the Zen 4 core, largely from reducing cache sizes and optimizing the footprint around that.

AMD Zen 4c 35 Percent Smaller Core Large
AMD Zen 4c 35 Percent Smaller Core Large

For a SP5, that means up to 128 cores.

AMD EPYC Bergamo Launch Large
AMD EPYC Bergamo Launch Large

The construction of Bergamo should be fairly familiar. We have the same IO die as we have with the AMD EPYC 9004 Genoa family, this is mostly a CCD change. That means we have the same memory, PCIe Gen5, and dual socket capabilities. If you want to learn more about the platform, see AMD EPYC Genoa Gaps Intel Xeon in Stunning Fashion.

AMD EPYC Zen 4 And Zen 4c Large
AMD EPYC Zen 4 And Zen 4c Large

In terms of performance, initial benchmarks put this part within striking distance of 1000 SPECrate2017_int_base per socket. That is awesome, and it is one of the biggest differentiators of this part. These are designed for scale-out cloud instances, not performance-per-core applications. Zen 4c, being a cache-reduced Zen 4, still is a very capable core and has features like AVX-512 and VNNI for AI inference.

AMD EPYC Bergamo Zen 4 And Zen 4c Large
AMD EPYC Bergamo Zen 4 And Zen 4c Large

Beyond that, the real goal is to provide a 256 vCPU, per socket platform. That spec is one where AMD and Ampere are going to each claim leadership. Ampere AmpereOne has up to 192 Arm cores per socket. Since it is not realistic for most cloud providers to do a single vCPU instance size splitting physical and SMT threads into separate vCPUs, Ampere will claim it has leadership in odd-numbered vCPU instance sizes with 192 physical cores to AMD’s 128. If one has an even number of vCPUs, say 2 vCPUs, then AmpereOne will have a maximum of 96 instances per socket while AMD will be at 128 still. For its part, Intel has a maximum of 60 cores/ 120 threads in this generation. Intel is counting on Sierra Forest to play in this space using its rapidly evolving E-cores.

Final Words

We will be testing Bergamo soon enough, today is just the launch (again.) Make no mistake though, this is a HUGE launch. When we discuss chips with hyper-scale vendors and major chip companies, the 5-year projections vary on how much will be these scale-out cores. Some say the answer is 25% of the market. Others are as high as 75%. What we know is that this is a huge new emerging segment and AMD has a plan to capture the market.

AMD EPYC Bergamo Package 1
AMD EPYC Bergamo Package 1

AMD is also announcing Genoa-X, the 3D V-Cache version of its EPYC Genoa today. Looking at AMD’s tools now, they can reduce the core footprint by reducing cache sizes on CCDs. AMD has another technology, however, to add a cache on a different plane using 3D V-Cache.

One of the cool announcements is that Meta has deployed “hundreds of thousands” of AMD servers since it started deploying AMD in 2019. Meta says that Bergamo is showing a 2.5x performance uplift over Milan, so that is why Meta will be deploying Bergamo as a next-generation platform.

AMD EPYC Bergamo Package 2
AMD EPYC Bergamo Package 2

Making a clear call today: AMD is going to put these two technologies together and create higher core count processors with 3D V-Cache added to pump core counts much higher in the future.

2 COMMENTS

  1. “… this will make a 2U4N platform a 2048 thread machine or one with 1024 cores per U.”

    2048 threads/2U is correct but
    4 nodes/2U * 2 sockets/node * 128 cores/socket = 1024 cores/2U = 512 cores/U

    Am I missing something?

  2. “AMD says its Zen 4c core is 35% smaller than the Zen 4 core, largely from reducing cache sizes and optimizing the footprint around that.”

    I don’t think this statement is true. The cache part of the core is where they’ve reduced the footprint the least. The reduction in cache is halving of L3 cache per core, but that’s not part of the core.

    The 35% reduction is probably due to a different frequency optimization point, and through having traditional Zen 4 verified structures which probably means you can make stricter assumptions.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.