AMD EPYC 9004 Genoa 12-channel DDR5-4800
On the memory side, this generation is the DDR5 generation. This is extremely important to understand Genoa. A 64-core Genoa chip, for example, will get 12 channels of memory, a 50% increase over Milan. Beyond that, AMD is also going from DDR4-3200 to DDR5-4800 speeds, for a huge jump in per-channel bandwidth. Intel’s Sapphire Rapids will be 8x DDR4-4800 as we have seen so AMD has 50% more memory channels. CXL of course is the other side of memory bandwidth in this generation, so let us get to that later.
While we get more performance there is another side to this. DDR5 has more components such as the PMIC for power management being moved onboard. That, plus moving to a new production generation, means that DDR5 prices are much higher where we have been paying around 50% more. For Genoa, that means each DIMM costs 50%+ more than the DDR4 Milan uses, and there are 50% more channels to fill. That has led to some interesting things like non-binary memory to reduce costs and match AMD’s 48 and 96-core offerings on a GB/ core basis.
AMD is not supporting 2DPC on its dual-socket platforms at launch, nor is it supporting features like LRDIMMs. Here is a look at what AMD is supporting with Genoa.
All of this leads to >2x the memory bandwidth on a per-core basis. Of course, a lot of the 12 memory channels are to keep the same ratio on top-end parts. AMD had one memory channel per 8 cores on the previous generations of Rome and Milan 64-core parts. It now has the same ratio on the 50% larger 96-core parts.
With the bigger chip, AMD has modes to partition chips off into smaller segments of up to three dies and three memory channels per partition (times four for the entire processor.) Intel just disclosed its SNC4 and UNC modes for Intel Xeon Max this week as well. These are of a similar feature class although not exactly the same.
All of this discussion of memory bandwidth is not complete without CXL.
AMD EPYC 9004 Genoa CXL Overview
The new chips support CXL 1.1 with some forward features. AMD is only supporting Type 3 memory buffers that one can think of as memory expansion devices. These generally show up in operating systems as new NUMA nodes with attached memory capacity, but without CPUs.
Latency is on the order of accessing memory connected to the remote socket’s CPU in a dual-socket server. Here is the latency hierarchy that we saw in Compute Express Link CXL Latency How Much is Added at HC34.
The key here is that with up to 64x lanes that can be used for CXL devices, and a CXL 1.1 x16 connection being roughly as much bandwidth as two DDR5 channels, AMD can, in theory, get not just more memory capacity with CXL 1.1 devices, but also more available bandwidth (whether it can use that bandwidth is another story.) If it could use all of the memory channels, then a 1P Genoa system would, in theory, have 12 local DDR5 channels plus ~8 more via CXL 1.1 devices that will look like memory sitting on other non-processor-attached NUMA nodes. That is why the next generation of systems with CXL are going to start getting crazy.
Talking about the parts is great, but next, let us get to the SKUs themselves.