Why Acceleration Matters for Sapphire Rapids
Intel had an observation that was partly self-serving but also valid. SPEC CPU2017 is largely focused on scaling relatively simple tasks across cores and scaling very well. Intel’s observation is that its customers are running VMs, containers, AI, and more. We are at an interesting point where the SPEC CPU2017 is still the standard, but at the same time, whenever we discuss with larger hyper-scalers, they are less excited about the applicability of SPEC CPU2017 to their workloads. That seems to align with Intel’s slide, but also the figures that we found from our SPEC CPU2017 testing.
We actually did an entire deep dive on the acceleration. We were using pre-production Platinum 8490H’s with all accelerators turned on, but we were not allowed to share the model number at the time. The summary, though is that when we hit accelerators either via on-core instructions or on-package accelerators, performance is very different.
We are going to punt a bit to the Hands-on Benchmarking with Intel Sapphire Rapids Xeon Accelerators piece so you can see the impact of the Sapphire Rapids accelerators across things like the in-core AMX and the QAT accelerators.
Still, we wanted to show why acceleration matters in a use case that was pertinent to us. As a result, we bootstrapped the nginx QAT acceleration to the 32-core 8462Ys, and then ran the full STH nginx stack with the database (and minus back-end tasks like backups/ replication and such) all on a single node and compared it to the AMD EPYC 9374F. Here is what we saw:
QAT is only accelerating a portion of the workload, perhaps 10% but also one that can introduce jitter. At the same time, using the QAT offload changes which solution ends up ahead even though the EPYC 9374F is a 3.85GHz base clock chip with 256MB of L3 cache and the Xeon is a 2.8GHz base clock chip with 60MB of L3 cache.
The reason we highlighted the fact that most 4th Gen Intel Xeon SKUs do not have QAT enabled is that QAT is a path for performance per core gains for Intel in this generation. We will note though, that the QAT setup did require an hour or two of setup, and that was faster since we had the recipe.
4th Gen Intel Xeon Scalable Sapphire Rapids: Power Consumption
In terms of power consumption, the Sapphire Rapids systems were actually better than we expected. Our Supermicro dual socket test platform was fairly consistently hitting peaks of 900-950W using the Intel Xeon Platinum 8490H 60-core parts.
We had this system running side-by-side with the ASUS RS720A-E12-RS24U with the 96-core parts. AMD has a slightly higher TDP, but the Intel server used less power. Part of that may be due to the PSUs. Other parts may be due to the cooling configuration and the fact that we had four fewer DIMMs per socket or eight total, which accounts for ~40W + 15% cooling overhead, so around 46W from just the memory difference.
Our single-socket Supermicro platform was consistently in the 400W range with the slightly lower-end SKUs. While that may seem like a lot, it is actually important. For things like VMware licensing, consolidating two older PCIe Gen3 Skylake era 16-core sockets to a single 32-core processor system saves on license fees. Power consumption is likely equal or lower. One also gets more PCIe bandwidth and lanes and more memory bandwidth than older 2 socket servers. Intel finally has a straightforward single socket consolidation case fitting into 32-core license packs easily while maintaining a similar power footprint.
Next, let us get to our market impact and final words.