Intel Xeon MAX 9480 Deep-Dive 64GB HBM2e Onboard Like a GPU or AI Accelerator

11
Intel Xeon Max Chip 3
Intel Xeon Max Chip 3

Today we have something that has taken months to write, and we feel that the best we have done is to give a sense of what Intel’s coolest CPU is capable of. The Intel Xeon MAX 9480 combines 56 cores with memory on the package. The memory is not standard DDR5. Instead, it is 64GB of HBM2e, the same kind of memory found on many GPUs and AI accelerators today. What seemed like a straightforward review at the outset became absolutely fascinating, especially when we pulled all of the DDR5 memory from a system and watched it boot. Let us get to it.

Intel Xeon Max 9480 Overview

As one might imagine for a piece this big, we have a video:

We are also going to say Intel is sponsoring this piece. The company, for example, sent not just CPUs, but also a full development platform so we could use the new CPUs since we needed something we knew supported them. As always with STH content, they are seeing this for the first time when it is published as we do not pre-share results or let vendors review content before it goes live.

Intel Xeon Max Dev Platform Angle
Intel Xeon Max Dev Platform Angle

The Intel Xeon MAX in many ways would remind one of an Intel Xeon Platinum 8480+. It has 56 cores, but there is one glaring difference even in the lscpu: the cache. The MAX CPU has a full 112.5MB of L3 cache while the Platinum 8480+ has 105MB. This, of course, is not the big difference.

Intel Xeon Max 9480 Lscpu Output NPS 4
Intel Xeon Max 9480 Lscpu Output NPS 4

Both of the 350W CPUs have somewhat similar clock speeds. The Xeon Max 9480 has a base clock of 1.9GHz, 100MHz lower than the Xeon Platinum. Turbo clocks are further off at 3.5GHz vs. 3.8GHz. Still, those clock speeds are not the big differentiator. Instead, it is the four 16GB HBM2e packages that are added to the Intel Xeon Max.

Here is a quick look at a standard 56-core 4th Gen Intel Xeon Scalable package with four compute tiles. This is what the Platinum 8480+ looks like underneath its heat spreader.

Intel Vision 2022 Sapphire Rapids Top 1
Intel Vision 2022 Sapphire Rapids Top 1

Here is the Intel Xeon Max. One can see that we have the four compute tiles, but each has its own HBM2e package next to it.

Intel Vision 2022 Sapphire Rapids HBM Top 2
Intel Vision 2022 Sapphire Rapids HBM Top 2

When we first saw the package in early 2022, the “winglets” were noticeable but we thought they were just there for a development chip. We were wrong. These winglets add extra space to the package for some of the components moved to make room for the HBM2e.

Intel Xeon Max Chip 3
Intel Xeon Max Chip 3

This is far from the first time we have seen Intel sell special versions of chips with functionality on protruding packages. Below is the 2017-era Omni-Path fabric part next to the non-OPA 1st Gen Intel Xeon Scalable.

Intel Xeon Scalable Fabric V No Fabric
Intel Xeon Scalable Fabric V No Fabric

Still, this Xeon Max iteration of a chip outcropping is a bit less exciting.

Intel Xeon Max Chip 2
Intel Xeon Max Chip 2

Just to be a bit more complete, here is the back side of the winglets. One can see that there are no pins added to them.

Intel Sapphire Rapids HBM Pad Side 1
Intel Sapphire Rapids HBM Pad Side 1

The Intel Xeon Max CPU that Intel sent for review today is the Intel Xeon Max 9480. That is the highest-end of the Intel Xeon CPU Max series (we are just calling these Xeon Max because we never hear folks use “CPU Max” in conversation.) Of the five SKUs, the Xeon Max 9462 is perhaps the second most intriguing at 32 cores with 64GB of HBM2e. That gives 2GB of memory per core which may be a better ratio for applications. It also fits better in VMware and Microsoft Windows Server licensing for those who want to use these for general-purpose servers.

Intel Xeon Max 2023 CPUs
Intel Xeon Max 2023 CPUs

While we do not have these CPUs, what we do have is a development platform so let us take a look at that quickly before moving on.

11 COMMENTS

  1. Terabyte per second STREAM is spectacular – this is comparable speed from a single server to running STREAM across an entire Altix 3700 with 512 Itanium processors in 2004, and rather faster than the NEC SX-7 which was the last cry of vector supercomputers.

  2. Despite what Intel stated by power states, I’d have at least tried booting the Xeon Max chip on a workstation board. Worth a try and it would open up a slew of workstation/desktop style benchmarks. While entirely inappropriate a chip of this caliber, I’m curious how a HBM2e only chip would run Starfield as it has some interesting scaling affected by memory bandwidth and latency. Be different to have that HBM2e comparison for the subject.

  3. The open foam results don’t match between the two plots. Where one says hbm2e only is 1.85 times faster and the other says it’s only 1.05 times faster.

  4. Can these be plugged into a normal workstation motherboard socket? as in a few years when these come on the market that mortels can buy off of ebay we wantto play with them in normal motherboards with normal cooling air cooling solutions

  5. I had no idea that they’re able to run virtualization. I remember that I’d seen them at launch but I was under the impression that they’re only for HPC and that they’d done no virtualization and acceleration because of it. We’re not a big IT outfit, only buying around 1000/servers/year but we’re going to check this out. Even at our scale it could be useful

  6. Is that a real Proxmox VE pic? I didn’t think these could run virtual machines. Why didn’t Intel just call these an option if so. That 32c 64gb part sounds chill

  7. It’s possible virtualization is not an advertised feature because there are too many information-leaking side channels.

    At any rate, as demonstrated by the Fujitsu A64FX a couple years ago, placing HBM on the CPU package makes GPUs unnecessary and is easier to program. After the technology has been monetised at the high end, I expect on-package HBM will be cheaper than GPU acceleration as well.

  8. Thank god there’s a good review of this tech that normal people can understand. This is the right level STH. I’m finally understanding this tech after years of hearing about it.

  9. That STREAM benchmark result is impressive.

    My 4GHz 16 core desktop computer copies value of double arrays at 58GB/sec, according to my STREAM build with MSVC, and I consider it as pretty decent, because it copies 15 bytes per 1 CPU clock cycle.

    intel compiler should optimize STREAM for loop of double array copy with very efficient SIMD instructions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.