Today we get the launch of the new Intel Data Center GPU Max series. The “Max” branding is what Intel is rolling out for its HPC families of CPU and GPU products. While we have seen the new GPUs, codenamed Ponte Vecchio before, Intel is now talking about specific products ahead of SC22 in Dallas next week.
New Intel Data Center GPU Max at SC22 Including PCIe and OAM
Here is the Intel Max series summary slide. We are going to focus on the right side of this diagram for covering the new GPU series.
The new Intel Max series GPU, formerly “Ponte Vecchio”, scales to two stacks of Intel’s Xe GPU cores, 8 HBM2e controllers, and 16 Xe links.
Since we first heard the name Rambo Cache name at SC19, it was clear that this is a major feature. Each of the new Max GPU series stacks has 204MB of L2 cache meaning that the full package can have up to 408MB of L2 cache.
When we talk about “stacks” they are important. Ponte Vecchio is a next-generation accelerator so power targets for the industry are now well beyond PCIe power and cooling capabilities. Still, PCIe form factors are popular since they are easy to integrate. Intel plans to offer the Data Center GPU Max 1100 as a PCIe accelerator. This is a single stack solution so it has less compute, cache, and HBM2e memory, but it fits into a 300W TDP envelope.
Intel also said it will have Xe Link bridges to connect up to four GPU Max 1100 units.
The full GPU Max series packages will be the OAM modules. Intel will have the Max Series 1350 GPU with a 450W TDP and 96GB of HBM2e. The larger OAM module is the full 600W 128GB of HBM2e and the full 128 Xe cores and will be called the Max Series 1550 GPU.
Those individual TDPs are important. Like we have seen with NVIDIA and its A100 “Redstone” platform in reviews like the Dell EMC PowerEdge XE8545 review, Intel is going to be selling OAM subsystems with these modules. 450W x 4 = 1.8kW while 600W x 4 = 2.4kW. Intel did not discuss larger OAM subsystems, but it has solutions like the 8x OAM solution with the Habana Labs Gaudi2.
Between the different OAM modules, Intel has a 4x Xe Link solution so that each can address the other GPUs directly. It took until the V100 generation for NVIDIA to fully embrace this type of SXM-to-SXM NVLink topology (while eventually adding NVSwitch in that generation.)
We re more focused on products in this piece, so we are not going to get too deep in Xe Link.
Intel also showed its next-generation “Rialto Bridge” Data Center Max GPU series. These will feature 800W per OAM and be liquid cooled while offering features like 25% more Xe cores. Rialto Bridge is the refinement generation of Ponte Vecchio.
Intel expects the new GPUs to be in the same subsystem making system design easier for server vendors.
It is great that we are getting actual product names and specs for Ponte Vecchio. Intel said that it will not have a Top500 linpack run for SC22 in Dallas next week with Aurora, but it does have a test system installed. A few things are worth noting here. First, the industry is moving towards the XPU design which is Falcon Shores for Intel. We already covered that in our Intel Falcon Shores XPU Update at SC22 piece. Still, current computing models will be around for some time. Second, the PCIe form factor is proving challenging for GPUs as power and cooling requirements increase. PCIe is easy to deploy in standard servers, but it is not ready for the future.
Finally, we have been talking about the product for years, it is good to see Ponte Vecchio take the next step in becoming a product by getting specs and model numbers.