Home Server Server CPUs 4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward

4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward

34

Market Impact 2023: Intel Sapphire Rapids vs. AMD EPYC 9004 Genoa

It is probably going to be the hottest topic for the next few quarters. Let us cut the marketing speak and get to where we are in the market:

Intel And AMD Core Growth 2010 To Sapphire Rapids
Intel And AMD Core Growth 2010 To Sapphire Rapids

If you want the maximum performance per socket, the AMD EPYC 9654 is still the king, but now with an asterisk. Intel has a number of accelerators. Those can range from features like AMX for AI inference on core to dedicated accelerators like QAT, IAA, DLB, and DSA. If those accelerators are used, Intel can make a run at maximum performance while using fewer cores.

Intel Vision 2022 Sapphire Rapids HBM Top 1
Intel Vision 2022 Sapphire Rapids HBM Top 1

The dark horse in the performance hunt is really the Intel Xeon Max line. Intel has been primarily marketing these chips to the HPC space. Part of that is that many need to make their way to the Aurora supercomputer, while they will likely be back from manufacturing ready to ship by the end of February/ early March. The Intel Xeon Max line’s ability to either boot with its HBM as a distinct tier, as the only tier of memory or in caching mode should intrigue many. HBM2e in caching mode will increase memory bandwidth considerably. The Intel Xeon Max 9462 at 32 cores and with higher-clock speeds may be a very valid Xeon Platinum 8462Y+ competitor in many workloads at a ~$2,000 premium. A wise IT organization would take a look at Xeon Max since it offers a new performance vector.

4th Generation Intel Xeon Scalable Sapphire Rapids 2
4th Generation Intel Xeon Scalable Sapphire Rapids 2

Still, after months with the parts and having written now a ~10,000-word essay on the new parts, it feels like the maximum impact of Sapphire Rapids was missed. Intel’s story is one extolling the virtues of acceleration, yet 56% of its SKUs do not have QuickAssist, one of its most useful accelerators, since most servers these days do crypto and compression tasks. Also, Intel has not shown its accelerator roadmap. All of these accelerators are here today, but will they be in Granite Rapids, Emerald Rapids, Sierra Forest, or other future processors? If not, as a developer, why would one develop for a QuickAssist target rather than a DPU accelerator for crypto and compression offload? That is a tough question made tougher by Intel holding back accelerators to push an Intel On Demand agenda.

4th Generation Intel Xeon Scalable Sapphire Rapids 9
4th Generation Intel Xeon Scalable Sapphire Rapids 9

The Intel On Demand part is one that was compared to BMW’s charging subscriptions for seat heaters as a service. I understand Intel’s desire to move to an as-a-service model. At the same time, let us get real for a moment, getting PCIe accelerators for functions like QuickAssist is not that much more expensive. We have been paying a $200-300 premium for DPUs over high-speed NICs. How much is Lenovo going to charge for a QAT accelerator or two? If I purchase a server with four Intel Xeon Platinum 8460H’s at over $10,000 list price each, how much can an OEM like Lenovo going to charge for two QAT accelerators that are a portion of the functionality we get from a $300 NIC to DPU premium?

4th Gen Intel Xeon Scalable Sapphire Rapids QAT Accelerator Distribution
4th Gen Intel Xeon Scalable Sapphire Rapids QAT Accelerator Distribution

An inescapable feeling with Sapphire Rapids is that they would be extremely compelling with accelerators turned on and more than half of this generation’s servers having immediate access to the acceleration. With the hodgepodge of acceleration capabilities, even on high-dollar parts, it feels strange that Intel decided to hold back its top competitive advantage over AMD from more than half of its lineup.

Let us get a little more personal here. We saw Intel’s vision using an old copy of STH’s web hosting stack updated with QAT. That change flipped the per-core performance script even with a massive cache and clock speed deficit.

Intel V AMD At 32 Cores STH Nginx Stack Performance QAT Impact
Intel V AMD At 32 Cores STH Nginx Stack Performance QAT Impact

Intel will sell a large number of the new 4th Gen Xeon series parts. For existing Intel customers that want to stay with Intel, it is an easy upgrade path. Those buying 16 cores in 2017-2020 now have a SKU stack designed for 2:1 or more consolidation with an emphasis on 32 core parts in this generation. Likewise, the Platinum 8180 and 8280 28-core top bin parts now have 56 and 60-core options to achieve a 2:1. We even saw the impact of moving from 4-socket Platinum 8280 and 8380H systems to 2-socked Platinum 8480.

4th Gen Intel Xeon Scalable Sapphire Rapids Launch Core Count Distribution
4th Gen Intel Xeon Scalable Sapphire Rapids Launch Core Count Distribution

Still, it feels like enabling some acceleration across the stack (or even just the majority of the stack) would have gone a long way in bolstering the competitive story.

Final Words

For many organizations, the new processors are going to be game-changing. Make no mistake; this is the biggest upgrade to Xeon in over a decade. Not only do we get 50% more cores than a generation ago, but we have a jump in PCIe lanes, PCIe Gen5, CXL 1.1, DDR5, and a host of onboard acceleration capabilities.

4th Generation Intel Xeon Scalable Sapphire Rapids 14
4th Generation Intel Xeon Scalable Sapphire Rapids 14

Intel knows what AMD launched with Genoa. Some of the list pricing of Sapphire Rapids looks almost like it was designed to discount. The market will sort that out. While Intel does not have a direct socket-to-socket top-bin competitor to AMD, what it does have is a range of products under 200W TDP, almost as many 32-core SKUs as AMD has in its entire EPYC 9004 SKU stack, and scale. These lower power and core count SKUs move volume, ensuring that Intel has volume for its Sapphire Rapids parts and for its server OEMs.

Supermicro SYS 221H TNRR 2U Intel SPR CPU And Memory 5
Supermicro SYS 221H TNRR 2U Intel SPR CPU And Memory 5

Many will see the launch today and be shocked by a $17,000 Platinum 8490H. Realistically those are CPUs designed for massive scale-up systems where the TCO is measured in hundreds of thousands of dollars per system and list pricing is often discounted heavily at a system level.

4th Generation Intel Xeon Scalable Sapphire Rapids 1
4th Generation Intel Xeon Scalable Sapphire Rapids 1

Intel has two very different performance stories, one with accelerators and one without. It can be competitive in many segments without them, but with them, Intel has the ability to get outsized performance per core gains.

Still, Sapphire Rapids has matched AMD’s 50% generational core count improvement for 2023 servers. There is still a lot of 2023 left, and more CPUs will be launched. Perhaps the best part of the Sapphire Rapids launch is that we are seeing the direct impacts of competition in the market. That is perhaps the most important factor for server buyers with this launch.

34 COMMENTS

  1. Wow … that’s a lot of caveats. Thanks for detailing the issues. Intel could really do with simplifying their SKU stack!

  2. Not sure what to think about power consumption.

    Phoronix has average power consumption reported by sensors that is ridiculously high, but here the peak power plug consumption is slightly less than Genoa.

    Someone needs to test average plug power on comparable systems (e.g. comparable nvme back-end).

  3. This is like BMW selling all cars with heated seats built into them and only enabling it if you pay extra.

    Intel On Demand is a waste of engineering, of silicon, of everything, to please shareholders.

  4. I’ve only made it to the second page but that SKU price list is straight up offensive. It feels like Intel is asking the customer to help offset the costs of their foundry’s missteps for the past four years.

    The segmentation is equally out of control. Was hoping Gelsinger was going to reign it in after Ice Lake but I got my answer loud and clear.

  5. New York Times: “Inside Intel’s Delays in Delivering a Crucial New Microprocessor

    The company grappled with missteps for years while developing a microprocessor code-named Sapphire Rapids. It comes out on Tuesday.”

    – NOT how you want to get free publicity for a new product!

  6. I was so focused on Intel having fewer cores than AMD with only 60 I forgot that there’s still a big market for under 205W TDP CPUs. That’s a good callout STH

  7. Intel did similar things when they lost track versus RISC/AMD back in the day. Itanium, Pentium IV (netburst), MMX and SSE were the answers they used to stay relevant.

    P4’s overheated all the time (think they have this solved today with better cooling, but power is still a heavy draw).

    MMX and SSE were good accelerations, complicating compilers and developers lives, but they existed on every Intel CPU, so you had a guaranteed baseline for all intel chips. Not like this mess of sku’s and lack of predictability. QAT has been around a while, and lots of software support, but the fact it’s not in every CPU holds it back.

    The one accelerator that doesn’t need special software is HBM yet they limit that to too few SKUs and the cost is high on those.

    This is not a win for Intel…this is a mess.

  8. I’ve just finished reading this after 90min.

    THAT is someone who’s got a STRONG understanding of the market. Bravo.

    Where’s the video tho?

  9. There is soomething wrong with the pricing for these products.

    Especially with accelerators there is a price thing going on:
    -QAT can’t compete with DPUs; as you mentioned those cost $300 more than a NIC
    -AMX on $10k+ CPUs (with 56 or 60 cores) can’t compete with a $1500 GPU while consuming much more power than a CPU with less workload plus the GPU.

    These sticker prices might not be end-prices. High core Genoa is also available now ~20% under MSRP from european retailers. I don’t really trust MSRP for this generation.

  10. @Lasertoe – What we’re seeing here is the first step towards the death of the DPU. What is going to be ending it is when Intel integrates networking fabrics on package and thus you can dynamically allocate cores towards DPU tasks. This provides the flexibility, bandwidth and latency that dedicated external cards will quickly disappear.

    Intel isn’t doing themselves a favor by having their on-die accelerators behind the On-Demand paywall.

  11. Hello Patrick
    I suspect you will earn lots of money if you could monetize your Intel SKU excel sheet 🙂
    How on Earth I can pick the best CPU for my workloads ?
    Are there any tools that could identify which accelerations might be helpful for my workloads ?

    Whole concept of the On Demand is kinda rotten.
    I deploy the platform, I migrate the workloads, I realize that maybe some additional accel will be beneficial (how ?), I purchase the extra feature (and it won’t be cheaper if purchased from the get go), and then I need to trigger workload wide software refresh into acceleration enabled version ?
    Hard to see that.
    Sorry if the accelerators are meant to be decision factors there need to be widely adopted, they need to be a must, a no brainer. And they need to have guaranteed future.

  12. I’m extremely confused how NONE of the “Max” SKUs are being offered with ANY of the onboard accelerators! (other than DSA, which seems like the least helpful accelerator by far.)

    Is that a typo? The Max SKUs don’t even offer “on demand”?

  13. @Kevin G:

    I don’t think that will happen. I think Intel and AMD will both integrate DPU-like structures into their server CPUs.

    Allocationg cores “towards DPU tasks” is already possible when you have an abundance of cores like Genoa (and even more with bergamo). The DPU advantage is that those (typically ARM) cores are more efficient, don’t need a lot of die area and don’t share many resources with the CPU (like caches and DRAM).

    I can see a future where efficient coress with smaller die area like Zen4c or Atom (or even ARM/RISC-V) work along high-performance cores for DPU tasks but they need independent L3 caches and maybe DRAM.

  14. Well, have to admit, I didn’t think there would be anything below the $1,500 mark. Granted, there’s not much, but a few crumbs. Now to see if you can actually get those SKUs.

    Not buying the power levels until I see some actual test results. Frankly the lack of accelerators on so many of the high end SKUs definitely raises a few doubts as well. Why leave the thing you’ve been hyping up all this time from so many SKUs, and does this mean that there are, 4-5 different chip lines being manufactured? Thought one of the main angles was that they could just make a single line and bin those to make your variations and offer the unlocks to all the models?

    Just waiting for all the “extras” to become a recurring subscription. You want the power efficiency mode turned on? That’s $9.99/hr/core.

  15. Can anyone explain the difference between the Gold 5000 and Gold 6000 series? I can’t find any rhyme or reason to the distinction.

    Adding to the confusion, the Gold 5415+ actually appears to be substantially worse than the Silver 4416+, and the Silver 4416+ costs $110 more. Why would a Silver processor cost more than a Gold processor and be better? There’s a pretty meaningless-looking distinction in base clocks, but given where the all-core turbo is at, I would bet that loading 8 cores on the 4416+ would yield clock speeds that aren’t far off from the all-core turbo clock speed of the 5415+… and then you still have another 12 cores you can choose to load up on the 4416+, with over 50% more cache!

    The SKU matrix doesn’t seem very well considered. I also agree with Patrick’s comments on the confusing state of the accelerators; I think Intel should have enabled 1 of every accelerator on every single SKU, at a minimum. If they still wanted to do “On Demand”, that could allow users to unlock the additional accelerators of each type, but even having 1 would make a significant performance difference in workloads that can use them, and it would be an effective way to draw the customer into buying the licenses for additional accelerators once they are already using them.

  16. Will be interesting to see The hedt platform later how it Will perform campare to rapid lake ryzen and of course threadripper and also IF they have some new things outside of pci-e5 ddr5 or IF they cripple it as they did with x266

  17. What an absolute mess. The naming has been awful since the whole “Scalable” marketing debacle but this is taking it to the next level. Was hoping they would sort it this generation. Sigh.

  18. Accelerators have a chicken vs. egg adoption challenge. Intel hedged their bet with “on demand,” which makes adoption failure a self-fulfilling prophecy

  19. I don’t know if anyone noticed, but in the chart on page 12 where Intel basically denounces the SPEC benchmarks they put “Gaming” twice in the “Customer workloads” set in relation to the release of a Xeon line.

  20. A lot of games require servers for multiplayer gaming, don’t they? Then of course you have cloud gaming, which is much smaller, I’d imagine.
    It does seem odd that they selected two customers with gaming workloads when there aren’t so many total.

  21. “On Demand” is bullshit. It’s nothing more than artificial scarcity, a.k.a the Comcast model. I would be very angry if I paid for all of those transistors and over half of them were locked behind an additional paywall.

  22. Thanks for the nice article. Unfortunately on general purpose computing it seems Intel is still trying to catch AMD and not successfully.
    I’m using phoronix benchmarks geometric means (from set of benchmarks) comparison here with the specified CPU TDP, E.g. benchmark number / TDP = X. so this basically shows efficiency of processing in comparison with declared TDP. Higher number, better efficiency.
    Intel 8280: 1.35
    Intel 8380: 1.46 — looks like 14nm -> 10nm transition was moderately successful
    Intel 8490H: 1.7 — again 10nm -> Intel 7 although it should be basically same, it looks like Intel did their homework and improved quite a lot.
    AMD 9554: 2.3 — and this is from completely different league. TSMC simply rocks and AMD is not even using their most advanced process node.

  23. Not sure if I get it right. It does seems like 8490H and 8468H had all accelerators enabled from the table you compiled

  24. I don’t find these particularly compelling vs. AMDs offerings. The SKU stack is of course super complicated, and the accelerator story doesn’t sound very compelling – also raises the question if one can even use these with virtualization. And I don’t think most software supports the accelerators out of the box with the possible exception of QAT. The on-demand subscription model also bears the risk that Intel might not renew your subscription at some point.

  25. Those SPECint numbers are ******* BRUTAL for Intel. I’m sure that’s really why they’re saying it’s not a good benchmark. If it’d been reversed, Intel will say it’s the best.

  26. I’d agree on the speccpu #’s.

    I read this. It took like 2hrs. I couldn’t decide if you’re an intel shill or being really critical of intel. I then watched the video, and had the same indecision.

    I’d say that means you did well presenting both sides. There was so much garbage out there at least there’s one place taking up the Anandtech mantle.

  27. Amazing review. It’s by far the most balanced on the Internet on these. I’ll add my time, it took me about 1.25 hours over 3 days to get through. That isn’t fast, but it’s like someone sat and thought about the Xeon line and the market and provided context.

    Thx for this.

  28. I think Intel is on the wrong path.
    They should be making lower powered CPU’s.

    Their lowest TDP CPU is 125W and its a measly 8 core, with a 1.9Ghz max boost frequency – I think something is wrong in Intel’s development department.

    1.9Ghz boost frequency should not require 125W TDP.

  29. Patrick’s SKU tables show the 8452Y as MCC, but that’s clearly impossible since it has 36 cores. It should be XCC (which would also match Intel’s table).

    I didn’t try to check all the others. 🙂

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.