AMD EPYC Genoa Gaps Intel Xeon in Stunning Fashion

November 10, 2022

AMD EPYC Genoa Zen 4 at its Core

The AMD EPYC 9004 series encompasses two major areas of improvement. First, we have the microarchitectural improvements. We then have the system-level improvements. At the first level, the AMD EPYC Zen 4 is a relatively small microarchitectural update from the Zen 3 generation. That is not to say there is no update, it is just not the big leap that Zen 2 to Zen 3 was or Zen 4 to Zen 5 is planned to be.

AMD EPYC 9004 Genoa Zen 4 Architecture Overview

One of the biggest changes is that AMD is adding even more cache and doing more work to prime the different cache stages. With double the L2 cache, the new chips can keep more data local to the cores and not have to go out to higher levels of the memory hierarchy. Those calls take much longer and use more power, so AMD’s solution is to build big caches and then optimize how they are used.

AMD EPYC 9004 Genoa Zen 4 Cache Hierarchy

Zen 4 has improvements over the previous generation that the company says are good for around a 14% IPC uplift. We are going to note here that the Zen 4 improvements are not heavily reliant upon things like AVX-512 and VNNI to get to these improvements. This is simply stating that compared to Milan at the same frequency, Genoa is faster.

AMD EPYC 9004 Genoa Zen 4 General Improvements For IPC

Here are some of the key comparisons at different parts of the Zen 3 and Zen 4 microarchitecture.

AMD EPYC 9004 Genoa Zen 4 Evolution From Zen 3

Perhaps one of the biggest changes for the HPC space (forward-looking perhaps to Genoa-X) is the addition of AVX-512. AMD will get more performance in areas like HPC from adding AVX-512. AMD and Intel have different AVX-512 implementations when it comes to parts of the execution, but the net result is that a class of vectorized operations gets faster.

AMD EPYC 9004 Genoa Zen 4 AVX 512 Bfloat16 And VNNI

Saying a chip supports AVX-512 is easy. The next question is what the chip supports. AMD is adding both bfloat16 support as well as VNNI support and a number of other AVX-512 instructions. Still, AMD’s list includes the AI instructions from both Cooper Lake and Ice Lake 3rd generation Xeons. AMD’s strategy is to be a fast follower. Developers can get new instructions on Intel Xeon. If there is uptake, then AMD can implement it in hardware. That is what we are seeing here. Intel will go beyond AMD Genoa with its Sapphire Rapids, but those new instructions can take some time to implement.

AMD EPYC 9004 Genoa Zen 4 AVX 512 Extensions

We are not going to cover the general ISA changes, but there are a few more changes under the hood with Zen 4.

Something we will be using are the security and virtualization changes. One of the big ones is also SMT protection for guests. AMD has the ability to ensure that there is not an active sibling thread on a core, removing a potential attack vector in multi-tenant (cloud/ virtualized) environments.

AMD EPYC 9004 Genoa Zen 4 ISA Security And Virtualization

We are going to let you read through the debugging and profiling improvements. 99%+ of users will never use these, but the last <1% will see these as extremely important for improving application performance.

AMD EPYC 9004 Genoa Zen 4 Debug And Profiling

Zen 4 is a fairly well-known architecture at this point. What is really new for Genoa is how Zen 4-based CCDs are packaged to make monster chips. That is what we are going to explore next.

21 COMMENTS

Gasmanc November 10, 2022 At 12:27 pm

Any chance of letting us know what the idle power consumption is?
ssnseawolf November 10, 2022 At 12:57 pm

$131 for the cheapest DDR5 DIMM (16GB) from Supermicro’s online store

That’s $3,144 just for memory in a basic two-socket server with all DIMMs populated.

Combined with the huge jump in pricing, I get the feeling that this generation is going to eat us alive if we’re not getting those sweet hyperscaler discounts.
hoohoo November 10, 2022 At 1:54 pm

I like that the inter CPU PCIe5 links can be user configured, retargeted at peripherals instead. Takes flexibility to a new level.
Stephen Beets November 10, 2022 At 2:35 pm

Hmm… Looks like Intel’s about to get forked again by the AMD monster. AMD’s been killing it ever since Zen 1. So cool to see the fierce competitive dynamic between these two companies. So Intel, YOU have a choice to make. Better choose wisely. I’m betting they already have their decisions made. :-)
Jorge November 10, 2022 At 3:08 pm

2 hrs later I’ve finished. These look amazing. Great work explaining STH
fuzzyfuzzyfungus November 10, 2022 At 6:17 pm

Do we know whether Sienna will effectively eliminate the niche for threadripper parts; or are they sufficiently distinct in some ways as to remain as separate lines?

In a similar vein, has there been any talk(whether from AMD or system vendors) about doing ryzen designs with ECC that’s actually a feature rather than just not-explicitly-disabled to answer some of the smaller xeons and server-flavored atom derivatives?

This generation of epyc looks properly mean; but not exactly ready to chase xeon-d or the atom-derivatives down to their respective size and price.
Chris S November 10, 2022 At 6:46 pm

I look at the 360W TDP and think “TDPs are up so much.” Then I realize that divided over 96 cores that’s only 3.75W per core. And then my mind is blown when I think that servers of the mid 2000s had single core processors that used 130-150W for that single core.
Chris S November 10, 2022 At 6:52 pm

Why is the “Sienna” product stack even designed for 2P configurations?

It seems like the lower-end market would be better served by “Sienna” being 1P only, and anything that would have been served by a 2P “Sienna” system instead use a 1P “Genoa” system.
BillB November 10, 2022 At 10:55 pm

Dunno, AMD has the tech, why not support single and dual sockets? With single and dual socket Sienna you should be able to be price *AND* price/perf compared to the Intel 8 channel memory boards for uses that aren’t memory bandwidth intensive. For those looking for max performance and bandwidth/core AMD will beat Intel with the 12 channel (actually 24 channel x 32 bit) Epyc. So basically Intel will be sandwiched by the cheaper 6 channel from below and the more expensive 12 channel from above.
Olaf November 11, 2022 At 1:13 am

With PCIe 5 support apparently being so expensive on the board level, wouldn’t it be possible to only support PCIe 4 (or even 3) on some boards to save costs?
George November 11, 2022 At 3:01 am

All other benchmarks is amazing but I see molecular dynamics test in other website and Huston we have a problem! Why?
FishnChips IT November 11, 2022 At 7:05 am

Olaf Nov 11 I think that’s why they’ll just keep selling Milan
Delamain November 11, 2022 At 12:31 pm

@Chris S

Siena is a 1p only platform.
Tim W November 11, 2022 At 1:15 pm

Looks great for anyone that can use all that capacity, but for those of us with more modest infrastructure needs there seems to be a bit of a gap developing where you are paying a large proportion of the cost of a server platform to support all those PCIE 5 lanes and DDR5 chips that you simply don’t need.

Flip side to this is that Ryzen platforms don’t give enough PCIE capacity (and questions about the ECC support), and Intel W680 platforms seem almost impossible to actually get hold of.

Hopefully Milan systems will be around for a good while yet.
Sabon November 15, 2022 At 8:05 am

You are jumping around WAY too much.

How about stating how many levels there are in CPUS. But keep it at 5 or less “levels” of CPU and then compare them side by side without jumping around all over the place. It’s like you’ve had five cups of coffee too many.

You obviously know what you are talking about. But I want to focus on specific types of chips because I’m not interesting in all of them. So if you broke it down in levels and I could skip to the level I’m interested in with how AMD is vs Intel then things would be a lot more interesting.

You could have sections where you say that they are the same no matter what or how they are different. But be consistent from section to section where you start off with the lowest level of CPUs and go up from there to the top.
Rob November 19, 2022 At 3:16 pm

There may have been a hint on pages 3-4 but I’m missing what those 2000 extra pins do, 50% more memory channels, CXL, PCIe lanes (already 160 on previous generation), and …
Greg December 11, 2022 At 12:37 pm

Does anyone know of any benchmarking for the 9174F?
EricT March 6, 2023 At 4:40 pm

On your EPYC 9004 series SKU comparison the 24 cores 9224 is listed with 64MB of L3.
As a chiplet has a maximum of 8 cores one need a minimum of 3 chiplets to get 24 cores.
So unless AMD disable part of the L3 cache of those chiplets a minimum of 96 MB of L3 should be shown.

I will venture the 9224 is a 4 chiplets sku with 6 cores per chiplet which should give a total of 128MB of L3.
Patrick Kennedy March 6, 2023 At 6:45 pm

EricT – I just looked up the spec, it says 64MB https://www.amd.com/en/products/cpu/amd-epyc-9224
EricT March 7, 2023 At 8:40 am

Patrick, I know, but it must be a clerical error, or they have decided to reduce the 4 chiplets L3 to 16MB which I very much doubt.
3 chiplets are not an option either as 64 is not divisible by 3 ;-)

Maybe you can ask AMD what the real spec is, because 64MB seems weird?
Andrew June 25, 2023 At 10:06 pm

@EricT I got to use one of these machines (9224) and it is indeed 4 chiplets, with 64MB L3 cache total. Evidently a result of parts binning and with a small bonus of some power saving.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

AMD EPYC Genoa Zen 4 at its Core

RELATED ARTICLESMORE FROM AUTHOR

SPEC Consortium Releases SPEC CPU 2026 Benchmark Suite: The Next Decade of CPU Benchmarking

Meta Buys Tens of Millions of AWS Graviton Arm Cores in a CPU Land Grab

Arm AGI CPU Launched Establishing Arm as a Silicon Provider

21 COMMENTS

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR