Intel formally announced a new class of Intel Xeon Scalable processors. Codenamed “Cascade Lake-AP” or Cascade Lake Advanced Performance. The disclosure was extremely brief, but it will not stop us from speculating what is happening. Intel Cascade Lake-AP is a new processor with a singular focus: bringing Intel Xeon Scalable back to the forefront of per-socket performance. In this article, we are going to show what Intel disclosed, then speculate on what the company might be doing behind the scenes.
As the AMD EPYC ecosystem has matured, performance and compatibility has gotten better while more systems are now adopting EPYC. Intel is still the market share forerunner so it makes sense that they want a halo product again. AMD is able to claim they have 32 cores and 8 memory channels with the AMD EPYC 7601, while the Intel Xeon Platinum 8180 only has 28 cores and 6 memory channels. Intel wants the halo again, especially since it knows that its 28 core Cascade Lake-SP (successor to the current Skylake-SP) is going to be put under pressure from AMD’s next-generation “Rome” EPYC 2 series.
Background: Intel Cascade Lake-SP
Intel Cascade Lake-SP made its major debut at Hot Chips 30. Here is our article Resetting Scalable Expectations Intel Cascade Lake-SP at Hot Chips 30. In that article we issued a challenge: Intel needed a better story. A new Intel Cascade Lake-AP is that response. In the meantime, here is the basic overview of Cascade Lake-SP.
Here we have the same PCIe 3.0 lanes, the same memory speed, the same UPI speed, the same maximum core count and so forth. There are some practical constraints. The LGA3647 socket only has so many pins for PCIe, UPI, and memory channels.
We noted in our previous story that a 28 core Intel Cascade Lake-SP CPU will not be a compelling option as a halo product compared to AMD EPYC “Rome”. Let us take a look at the Advanced Performance disclosure then make some speculation.
The Intel Cascade Lake-AP Disclosure
Intel made a very brief disclosure around Cascade Lake-AP. As a note here, we generally do not put too much weight in the numbers, but we are going to make a leap in our speculation section. Here is the slide Intel showed:
You can see, Intel Cascade Lake-AP is a multi-chip package. We are told that it is a two-chip package so we can infer it is essentially two 24 core chips packaged together. Each package has twice the memory channels of a Cascade Lake-SP part at 12 per package. That yields 24 lanes in a two-socket server which is the same as today’s Skylake-SP four-socket servers.
Although we do not normally comment on benchmark numbers, we wanted to highlight the STREAM Triad “Up to 1.3X” figure. This is absolutely intriguing. AMD EPYC has 8 channels of DDR4-2666 versus Intel Xeon Scalable Skylake-SP or Cascade Lake-SP’s 6 channels. Since the memory speed is the same, AMD claims a two-channel or a 33% advantage over Intel Xeon. The question is: with 12 channels per socket or BGA package, how does one get to a 1.3x memory bandwidth advantage. If we hold DDR4-2666 constant, we have 12 channels for Cascade Lake-AP and 8 channels for AMD EPYC 7601 “Naples.” That would be a 50% advantage or 1.5x the memory bandwidth. Getting to a 1.35x performance factor would imply using 12x DDR4-2400 (Cascade Lake-AP) versus 8x DDR4-2666 (Naples.) This would be truly strange to see memory speeds go down but the math is hard to work out otherwise if one has 50% more memory channels.
Speculation Points Around Intel Cascade Lake-AP
Since the disclosure was so minimal, we are left to speculate what this may be. We specifically did not ask any of our sources about the new parts, this is solely based on the slide above, what Intel has disclosed to STH about Cascade Lake-SP, and some inferencing.
Speculation Point #1: Intel Cascade Lake-AP Cannot Use LGA3647
Since LGA3647 was intended for a single die module with six memory channels and Optane Persistent memory support, it was designed for a time when 48x PCIe lanes and 6 channel memory was a major update over Intel Xeon E5 generations. It was not envisioned to take on something as large as AMD EPYC. There are a limited number of pins, significantly less than AMD uses because Intel does not have as many memory channels nor PCIe lanes to connect. As a result, expanding I/O in the current socket is nearly impossible. To utilize up to 12x memory channels, and if Intel wants to have PCIe leadership or parity again, Intel needs well over 5000 pins.
Further, a 24 core Intel Skylake/ Cascade Lake die uses a lot of power, two would run into a thermal issue even if you accept the notion that the LGA3647 has a much higher TDP than the 205W maximum of the Intel Xeon Platinum 8180 (it does.)
Between needing more pins to connect to memory and PCIe as well as additional thermal headroom needed, we come to our first point of speculation that Cascade Lake-AP will not be based on LGA3647.
Speculation Point #2: New Packaging for 96x or 128x PCIe Lanes
The other intriguing possibility is what happens with PCIe. Intel has the capability to deliver 6x PCIe x16 root complexes per dual die package easily. This is the standard configuration with 48x PCIe lanes per Intel Xeon Skylake-SP generation CPU die. The wildcard is the on-package “PCIe x16” interface used to deliver features such as Omni-Path in the Skylake-SP generation.
You may recall from our 2017 article that Intel has a fourth PCIe x16 solution onboard Skylake-SP, even if it is not widely used. Where the Cascade Lake-AP design could get completely wild is if Intel decides in Cascade Lake to make this more than just an on-package PCIe root and instead exposes it to the outside world.
For LGA3647 servers, this is not a viable path. There are only so many pins available in the package. Cascade Lake-AP, on the other hand, needs to be a larger package to handle the dual die plus already stated 12x DDR4 channels. That gives Intel the opportunity to potentially add pins and add the fourth PCIe x16 complex to the new design. If it does so, it will have 64x PCIe lanes per die, and 128x PCIe lanes per package, or 256x PCIe lanes in a dual Cascade Lake-AP server, bringing it to parity with current generation AMD EPYC. One could argue that such a configuration would go beyond AMD EPYC because the PCH has SATA connectivity and PCIe connectivity which means that the Intel solution would effectively have more lanes than AMD.
If Intel wants PCIe (lane) leadership with Cascade Lake-AP, that is how Intel can get it done with a new socket and a dual package configuration.
Speculation Point #3: UPI Interconnect Is Making This Possible
On the call the question was if this solution was using EMIB, a multi-chip packaging technology that Intel is pushing as a path forward. Intel stated that Cascade Lake-AP is not using EMIB and is using UPI. STH readers may recall that UPI is Ultra Path Interconnect and is the technology Intel introduced to replace QPI (Quick Path Interconnect) with Intel Xeon Scalable. Notably, UPI is the technology that is used in dual, quad, and octo socket configurations. Here is the Intel diagram from our Intel Xeon Scalable Processor Family Platform Level Overview.
This should look intriguing to our readers. Here is why: take the four-socket configuration, and draw package boxes a bit bigger around two dice instead of just one. Here is an example using a current generation quad socket Intel Xeon Scalable system from Supermicro. Assume that the Cascade Lake-AP dual socket solution will have the capabilities above and below the cross brace bar in a single socket. That is an enormous amount of DIMMs per server.
Assuming one uses 3x UPI CPUs, like the current generation Intel Xeon Gold 6100 and Platinum 8100 generation, one has a potential Cascade Lake-AP product. There are two 24 core chips on a single package with UPI between. There are package-to-package interconnects via UPI as well. Each die has direct access to every other die in a dual CPU configuration. Here is where things get more exciting. Four silicon chips, each with interconnect fabric between them yields a topology like this:
This is the same topology as an Intel Xeon Gold 6100 and Xeon Platinum in the four-socket server shown above, but with two packages instead of four. That is also AMD EPYC “Naples” generation topology, just with UPI instead of Infinity Fabric. UPI is a significantly higher performance interconnect than the first generation Infinity Fabric.
While Intel may have something else going on, and this is all speculation given the extremely limited disclosure, a design like this would allow Intel to use IP blocks and package them differently to claim a much larger per-socket core count, memory channel count, and PCIe count. It would also be a relatively easy design to quickly implement versus building its largest ever CPUs.
We have taken Intel’s Cascade Lake building block and barebones Cascade Lake-AP disclosure and gotten to the point where we have a viable halo product once again for the company. We had to speculate on some of the details, which may prove completely incorrect. On the other hand, these speculative points are reasonable if one thinks that Cascade Lake-AP was created quickly as a response to AMD EPYC and did not undergo a long development cycle.
Assuming Intel keeps the full features of the chips, which it can, we can have:
- Dual socket capable design (Intel disclosed this.)
- Two die design with up to 24 cores per die (Intel disclosed this.)
- 12 memory channels per socket (assuming socketed.)
- 96-128 PCIe lanes per processor or 192-256 lanes per system (see Speculation #2)
- A UPI enabled topology similar to today’s four socket Intel Xeon Scalable designs
In the end, this may be a competitive product against today’s’ AMD EPYC 7601. By the time Intel is ready to release Cascade Lake-AP it may find itself competing against a second generation AMD EPYC part with more cores and potentially PCIe 4.0. Intel Cascade Lake-AP is positioned to fight that battle.
If our Speculation #3 is correct, and this is essentially what we would call today’s 4-socket systems re-packaged into a 2-socket design, then this is going to be a relatively easy product to produce. At the same time, it is going to have an enormous impact on software licensing. VMware, as an example, is licensed on a per-socket basis. Using Cascade Lake-AP for VMware will half the cost of VMware licensing versus a quad-socket Intel Xeon Scalable Skylake-SP platform. For those buying per-socket licensed software today, it may be worth assessing what happens to those licenses as Intel and AMD enter into a core war in 2019.