Welcome back Intel! Intel Xeon has trailed AMD EPYC in P-core counts for around seven years. Five years ago, AMD pulled far ahead with the AMD EPYC 7002 “Rome” series and never looked back in terms of raw compute. Today marks the first time in about 86 months that Intel has a leadership server x86 CPU again. The Intel Xeon 6 with P-cores series, more aptly named the Intel Xeon 6900P series, brings 128 cores, 12 memory channels, accelerators, new process technology, and more to Intel Xeon.
Of course, there is a lot going on here, so let us get to it.
Video Version
We had a very short amount of time to do this one. Last week, we were at Intel in Oregon learning about the new chips, but then we went to film what will be our biggest video of the year just after. Our pre-production “Granite Rapids-AP” system arrived, and we had the weekend to work on it, which was a challenge when some benchmarks took over a day to run through test scripts on the 512-thread system.
Still, Intel furnished us with a pre-production development system to use with its top-bin chips. We need to say this is sponsored by Intel. For some of the power figures we usually would want to publish on a release day piece like this, we are going to wait for an OEM system with more realistic fan curves. The Intel platform was rough around the edges. That is to say, we are going to have more on this story.
For our Substack subscribers, we put the high-resolution JPEGs of Granite Rapdids-AP and the rest of the Xeon 6 family that we took in Oregon. Let us get to it.
When a Xeon is Not Just a Xeon, but a XEON
Starting here, it is essential to understand that Xeon 6 is like an ultimate slow roll-out. Today, we have the Intel Xeon 6900P series, the top-end part with 128 P-cores. A few months ago, we reviewed the Intel Xeon 6700E series “Sierra Forest,” which has 144 E-cores and uses a different socket and has half the TDP. Both are Intel Xeon 6, but they are very different. That leads to the Xeon 6 family covering a lot of ground, but not necessarily all in the same product.
For years, when we discussed a generation of Intel Xeon CPUs, it was the same socket and same core architecture, so long as we overlook abborations like the LGA1356 Sandy Bridge-EN and Ivy Bridge-EN. Today, we have effectively a 2×2 matrix of E-cores and P-cores. With today’s launch being the 12 channel P-core platform launch.
Important to note is that this is not the high-core count “Sierra Forest-AP” 288 core launch for scale-out cloud-native workloads. The Intel Xeon 6900P “Granite Rapids-AP” is Intel’s big iron dual socket Xeon for high-performance computing. We get 12-channels of DDR5-6400 or 8800MT/s MRDIMM/ MCR DIMM memory (more on this in a bit) so Intel can now match AMD’s memory channels, and exceed AMD’s memory bandwidth. 128 full P-cores is more than AMD currently offers (96 with Genoa since Bergamo is the lower cache cores.) There are 96 lanes of PCIe Gen5 per CPU for 192 lanes total, and there is CXL 2.0 support, all while enabling a full 6 UPI lanes for socket-to-socket bandwidth. L3 cache is no longer an “AMD has way more” on its mainstream parts (non Genoa-X) now that the Intel Xeon 6980P has 504MB of L3 cache.
While we focus a lot on the top-end SKUs, a lot of organizations buy midrange parts. That is something Intel will be rolling out in the future in its smaller socket designs. This is important as Intel will have modern parts for those who may want 32 cores per socket, but are not going to populate 12 memory channels and spend a lot on expensive motherboards that can handle larger sockets.
Given the fact that Intel has another socket, and other families of CPUs, the Xeon 6900P series is comprised of only five public SKUs that range from 72 to 128 cores. Only the 128 core part is not a core count total divisible by 3, so we would expect hyper-scalers and others to have custom SKUs based on the 120 core part (Intel Xeon 6979P), but Intel has the 128 core SKU. Also of note, four of the five feature an unapologetically high 500W TDP which is new for CPUs.
Another interesting part is the Intel Xeon 6960P with 72 cores, the same as the CPU portion of a NVIDIA Grace Hopper CPU. Intel is using SMT, so it is technically a 72 core/ 144 thread part, but it also gives Intel around 6MB of L3 cache per core and higher clock speeds. For AI servers, Intel has been winning sockets even without these new monster CPUs, and we will discuss why later in this piece.
Getting to the chips, here is the lscpu output of the Intel Xeon 6980P, the top-bin 128 core/ 256 thread part in a dual socket configuration. As you can see, we have over 1GB of L3 cache in the system and plenty of cores.
At the same time, we expect many of these systems to be run as three NUMA nodes because of how the silicon is constructed.
Intel keeps its memory controllers on the same physical die or compute tile as its cores. As a result, keeping memory access localized on those tiles can yield better performance.
It also yields a somewhat funky topology since two of the SNC3 NUMA nodes have 43 cores, and one has 42 cores. Intel has a 120 core SKU that might be more popular for both yield and for balance purposes. Still, it would have been cool if Intel used a 3x 43 tile design to make a 129 core CPU just as a marketing SKU to say it has 129 cores, or one more than AMD.
This tiled infrastructure you can easily see when looking at core to core latency charts. As unreadable as this probably looks after being compressed for the web, just know this is the 128 core hyper-threading off version. The 512 thead dual socket version took forever to run but was even more of an eye chart.
The behavior above can be explained by Intel’s design, putting three large compute tiles on a chip along with two I/O dies.
Part of what allows Intel to come back into the orbit of AMD’s top-end parts, and be competitive with AMD’s next-generation Turin is that it it is using new process technology. Intel 3 is being used for the compute die that also has its memory controllers and Intel 7 for the I/O die with the chips UPI, PCIe, and accelerators.
AMD pulled ahead in 2019 with Rome partly by moving to a chiplet design and partly because Intel 10nm was so delayed. We will see more of its chips now that Intel’s process technology is rapidly improving. Intel is bridging chiplets now with more advanced EMIB packaging, which is why its tiles look more tightly packed while AMD’s compute tiles look like their own islands compared to AMD’s I/O dies.
Still, the shift for Intel is very notable in this generation. Instead of only focusing on workloads accelerated by the company’s built-in accelerators, Intel now has a monster chip that can go head-to-head with AMD on raw CPU performance, but then also has its accelerators built-in.
One of Intel’s biggest features, however, is integrating those memory controllers into compute tiles, and then offering very fast memory options, so let us get to that next.
Wow can’t even hide Patrick’s love affair with Intel anymore can we ? Intel has not even properly launched this but yet it’s 128c Intel vs 96c Genoa, but AMD will have same 128c in 2 weeks time……just be honest finally and call it servingintel.com ;-)
Yawn… Still low on PCIe lanes for a server footprint when GPUs and NVME storage is coming fast and furious. Intel needs to be sold so someone can finally innovate.
Whether love or not, the numbers are looking good. For many an important question will be yield rates and pricing.
I wonder why Epyc is missing from the povray speed comparison.
One thing I’d like to see is a 4-core VM running Geekbench 6 while everything else is idle. After that Geekbench for an 8-core VM, 16-core, 32-core and so forth under similar circumstances. This sort of scaling analysis would help determine how well balanced the MCRDIMM memory subsystem is to the high-core-count processors–just the kind of investigative journalism needed right now.
As an asside, I had to work over eight captchas for this post.
The keyword would be availability. I checked just now, and these newer parts don’t have 1k Tray Pricing published yet. So not sure when would they be available. It felt painful to restrict the On-Premise Server procurement specification at 64 cores to get competitive bidding across vendors. Hope for the best.
It is hard not to get excited about competition, Intel has finally done it, they launched something faster than AMDs previous generation… Intel’s focus on AMX accelerations seems to have paid off, I guess we shall see when Turin launches in a few weeks.
@Patrick how do you manage to call ~53GB/S “closing in on” ~330GB/S? Even dual GR is slower by a factor of three.
Well there’s 90 minutes of my life well spent. I’d like to thank Patrick and the STH krew on this one.
Rodigas I didn’t get that sense at all. Intel’s the first to launch a 500W TDP part on a modern process and they’ve got cores and memory bandwidth so they’re winning for now. In Q4 when Turin is out we’ll say he loves AMD. It’s shaping up like Intel will at least show it’s competitive with Turin. That’s great for everyone.
Eric O – I’d like to see GB5 not 6. You’ve hit it before, GB6 is useless even for workstation class multicore.
Ram is right, these aren’t really available yet. Paper launch, or HyperScale only launch.
Emerth do you mean that 543391MB/S is much more than 330GB/S? The screenshots in the memory section show Intel’s far ahead. With MCRDIMMs adding 38% more bandwidth they’re getting close to 750GB/s on one CPU. So maybe they meant to say the Grace dual chip is almost up to a GR MCRDIMM single chip?
Intel’s doing a fine job. I can’t wait for 18A chips.
@RamShanker & francis:
– ASUS has a webpage up; search for ASUS “RS920Q-E12”, not quite for sale yet, but there’s a PDF.
– NextPlatform has published a guesstimate of 6980P U$24,980 and 6979P U$24,590; with lower prices for trays. Prices are fairly meaningless ATM with the competitor’s launch imminent.
> Intel is differentiating on things like features depending on the core type and SKU
Glad for the competition, but really wish they’d simplify the stack of SKUs. Is Intel On Demand gone?
Did Intel say why all the black ink on that Clearwater Forest chip?
That seems to be a very risky chip … gaa/bspd/hybrid bonding,18A all being introduced in the same chip. Did they actually demo one?
Never thought I’d see a vulture capitalist group (Apollo Global Management) investing in Intel. I thought Gelsinger was supposed to be Intel’s savior?
As others have pointed out, these seems a bit bias on the Intel side of things.
Yes, we’re all glad to see them finally getting their house in order and competing, but do better on containing your fanfare.
Wow. So many haters claiming bias. Go back and re-read the linked epic Rome review from 2019.
When I compare that to this one and all I see is that good products get good reviews (this one) and great products get great reviews (Rome). I also noticed how thankful Patrick is to have intel be competitive in the top of the line, which it is with this latest launch and how awesome it was back in 2019 to have AMD jump out and surpass intel just a few years after they were nearly bankrupt.
For detailed benchmarks I refer all to Phoronix – but a very nice piece by ServeTheHome.
many thanks, L
So, different kinds of “leadership” …
According to Micheal at Phoronix this year’s Intel 6980P is 12% faster than last year’s AMD 9684X.
But, the 6980P has 700 TDP and the 9684X has 400 TDP (while remembering that comparisons of their TDPs isn’t exactly equal) and AMD costs U$10K less. So, 75% more TDP and 5x more $ (unfairly comparing guestimated MSRP vs discount pricing). With the new Turin (coming RSN) offering moar Coors and a big bump over AMDs last generation; in the same socket.
Making a tortoise and hare comparison would be confusing as to who is who and who is ahead at a particular point in time.
We appreciate the effort it takes to put together these articles and enjoy reading them; except for the shameful core latency mushy pea soup image, while other sites has tack sharp puny numbers and a reasonable sized image file nonetheless.
I need Intel to go up so I can give Grandma her retirement back…
We all know Turin is coming. At least AMD now needs to push really hard instead of just coasting because Intel’s been so far behind. Let Intel have its weeks at the top.
On the plus Epyc now has some competition coming. The one big pain point will be software licensing where it’s licensed per Core.
What is up with that lscpu output for the SNC3 configuration? It reports:
Node 0: 43 threads
Node 1: 43 threads
Node 2: 73 threads
Node 3: 86 threads
Node 4: 86 threads
Node 5: 84 threads
And then threads 256-352 are completely unaccounted.
@emerth: I see 0.5TB/s in stream on 128 cores while NVLD seems to go to 0.6 TB/s — so I’d agree with “closing” here.
Intel fanboys forgot AMD Turin with 192 cores? That is always the case, Intel concentrated for quarter year profits instead of keeping R&D on good shape. Now better to concentrate selling factories to someones that need “old school” stuff. Game over. There could be some ligth if they could boost soon out 256 c, which is very unlikely. AMD will do it soon anyway, most likely minor change to just add 20 % more cores. But fanboys are fanboys and always forgotting the truth.