Intel Ponte Vecchio is a Spaceship of a GPU

6

Intel Ponte Vecchio Chipbuilding

Whereas we just focused on the lower-level architecture, which in many ways was coming up with lots of bandwidth and lots of vector/ matrix compute resources. Perhaps the bigger marvel is how Ponte Vecchio went from those base units to a usable chip. Ponte Vecchio is made up of dozens of smaller tiles and chips, from different fabs and on different process nodes. Intel had to do a lot of innovation to turn this concept into something useful.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio New Innovations
Intel Architecture Day 2021 Xe HPC Ponte Vecchio New Innovations

In summary, we have a chip with over 100 billion transistors, 47 active tiles (e.g. excluding structural components) and using 5 different process nodes.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio SoC 1
Intel Architecture Day 2021 Xe HPC Ponte Vecchio SoC 1

As a point of reference, the NVIDIA A100 is basically a GPU die, six HBM2(e) packages, and the connective tissue between those components. This is NVIDIA’s current state of the art for reference.

NVIDIA Tesla A100 Package
NVIDIA Tesla A100 Package

As one can imagine that created new challenges.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Key Challenges
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Key Challenges

The compute tiles are made up of 8x Xe HPC cores. If you open the previous page with the architecture you can basically tie the components as we went from core to slice and so forth with the SoC components here. For example, we have 8x Xe Cores per tile. We know that we have sixteen of these cores in a slice and then as these slices become a stack we get L2 cache and there is a Rambo cache tile being shown. If you put both side-by-side they make a lot of sense.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Compute Tiles
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Compute Tiles

While the compute tile is based on TSMC N5, the Ponte Vecchio Base Tile is on Intel 7. We also see that we have 144MB of L2 cache and a PCIe Gen5 interface. Intel did not say that this is a CXL device, although CXL 1.1 is focused on PCIe Gen5 generation parts and the key use case as outlined in its Intel Hot Interconnects 2021 CXL Keynote is for GPUs and accelerators.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Base Tile
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Base Tile

We discussed Xe link but this is based on TSMC N7. Typically we see SerDes and switch tiles on older nodes as they can be harder to scale.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Link Tile
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Link Tile

Intel says that it is getting over 45 FP32 TFLOPS per GPU using early A0 silicon. During its presentation, Intel made it seem like these figures had room to go up in production versions. Intel also has some massive interconnect and memory bandwidth.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio A0 Silicon Performance
Intel Architecture Day 2021 Xe HPC Ponte Vecchio A0 Silicon Performance

Intel is showing OAM modules and we covered how Intel Xeon Sapphire Rapids Supports HBM and Ponte Vecchio OAM Update. OAM is a next-generation form factor designed for 600W+ accelerators from multiple companies as an answer to the NVIDIA SXM4 modules. Here is an OAM cooler for example.

Facebook Zion OAM Module With Cooling
Facebook Zion OAM Module With Cooling

These fit into larger assemblies that allow GPU-to-GPU links and very interesting topologies. For example, we saw the Inspur OAM UBB as an 8x GPU design and looked at others using the Universal Base Board (UBB.)

OCP OAM UBB Breakaway
OCP OAM UBB Breakaway

Intel will also sell four GPU assemblies in addition to the 8x GPU and 6x GPU configurations.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio OAM Nodes
Intel Architecture Day 2021 Xe HPC Ponte Vecchio OAM Nodes

We will quickly note that the Ponte Vecchio system being shown with Sapphire Rapids is being liquid-cooled. We recently did our Liquid Cooling Next-Gen Servers Getting Hands-on with 3 Options piece to help get our readers ready for next-year’s transition.

Even the Aurora Exascale supercomputer blade is liquid-cooled with 6x of these GPUs and two Sapphire Rapids CPUs. There is a not-so-subtle pattern afoot.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Aurora Blade
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Aurora Blade

A big part of Intel’s efforts is around software and oneAPI. Intel’s goal is to have programmers build code and then run it on the right acceleration platform as seamlessly as possible. The key here is that Intel is trying to make a CUDA competitor for the industry that works across more than just its Ponte Vecchio GPUs.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio And CPU OneAPI
Intel Architecture Day 2021 Xe HPC Ponte Vecchio And CPU OneAPI

Overall, this is a hugely ambitious project and Intel is integrating significantly more tiles from more process nodes and in a more complex fashion. The goal is that all of this chipbuilding work yields something that is a step function better instead of a 10-20% improvement.

Final Words

The entire industry should be nothing short of excited for Ponte Vecchio. Just the fact that Intel is producing this complex piece of engineering and getting awesome performance from it means that as a scientist and/ or customer, you should be excited. Intel is betting that its oneAPI software stack will help onboard developers and code from CUDA to Intel’s GPU compute. Still, having this massive of a GPU out there means that both NVIDIA and AMD are going to need the next generation of data center GPUs to have a step function of performance improvement. We know that AMD has something special with its next-generation parts that were the driving force behind AMD winning two of the first three Department of Energy Exascale systems. For NVIDIA, we may actually have a competition coming to the data center space.

Intel Architecture Day 2021 Ponte Vecchio OAM Module Raja
Intel Architecture Day 2021 Ponte Vecchio OAM Module Raja

Again, we are going to say that the level of integration of this chip is well beyond other products being sold on the market today. The fact that these are becoming commercial products means that the entire industry will be forced to change the way it designs chips. Like a spaceship that hurdles mankind into the heavens to new frontiers of exploration, Ponte Vecchio will thrust humanity into a new era of compute design.

6 COMMENTS

  1. “The Xe Link is the high-speed coherent unified fabric …”
    Previous presentations on Ponte Vecchio mentioned CXL, and I presumed the Sapphire Rapids chips would be responsible for coherency. I didn’t hear CXL mentioned for Ponte Vecchio today, nor for any of the other chips except Sapphire Rapids.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.