AMD Instinct MI300X GPU and MI300A APUs Launched for AI Era

7

AMD Instinct MI300 Packaging Technologies

AMD calls this the 3.5D hybrid bond packaging technology. A big part of the new technology is co-packaging memory and compute to lower the energy used to shuttle data around the chip and system.

AMD Instinct MI300 Family Architecture 3.5D Packaging Gains
AMD Instinct MI300 Family Architecture 3.5D Packaging Gains

AMD is using a 7nm 3D V-Cache tile with a base 5nm CCD. AMD says that having millions of consumer and server chips with 3D V-Cache makes this much easier to integrate.

AMD Instinct MI300 Family Architecture SoIC
AMD Instinct MI300 Family Architecture SoIC

Here is the die stack. AMD has HBM and the IO die on the silicon interposer and then is using 3D hybrid bonding to achieve higher density. This is a few steps beyond what NVIDIA is doing on the H100, H200, and GH200 parts.

AMD Instinct MI300 Family Architecture Chip Stack
AMD Instinct MI300 Family Architecture Chip Stack

One of the key enablers is that AMD is using CCDs, IO Dies, and more from other products and with small tweaks is able to use them on the MI300 family. A few months ago I told Forrest Norrod at AMD that I felt like his chiplet strategy was to make a few chiplets and options and then just mix and match to make targeted chips for different applications. He did not necessarily disagree with me.

AMD Instinct MI300 Family Architecture Chiplet Reuse
AMD Instinct MI300 Family Architecture Chiplet Reuse

Here is the layout without the HBM3 on the edges. One of the challenges was to do things like create the vertical wires in the IOD to stack the XCD and CCDs. Something that is also notable here is that there are these R180 and Mirror blocks. These are where AMD has rotated the dies and is using a mirrored version to get the rectangle geometries to work.

AMD Instinct MI300 Family Architecture IOD And Stack
AMD Instinct MI300 Family Architecture IOD And Stack

BPV or Bond Pad Via and TSVs were important for getting this to all work. AMD went into a lot of detail around how it had to work with teams on different projects to ensure that the IP it was building, such as the IOD could be leveraged for server CPUs as well as for the MI300 family.

AMD Instinct MI300 Family Architecture Connecting Chiplets
AMD Instinct MI300 Family Architecture Connecting Chiplets

Here is a quick floorplan with the MI300A.

AMD Instinct MI300 Family Architecture Floorplan Power TSVs
AMD Instinct MI300 Family Architecture Floorplan Power TSVs

Another big feature of having the MI300A APU is that it can share power with critical parts of the package during different types of applications. Different applications can stress memory more or GPUs more. AMD had to model these and figure out how to power and cool the large package. Folks know how to model and deal with chips like Genoa today. This 3.5D stacked package is cutting edge for a production part, with perhaps only Intel Ponte Vecchio being more complex.

AMD Instinct MI300 Family Architecture Power Management And Heat Extraction
AMD Instinct MI300 Family Architecture Power Management And Heat Extraction

A lot had to go into the packaging to make this all work.

AMD ROCm 6 and AI Software Support

AMD has a lot of AI software stacks from ROCm for the GPU. ZenDNN for inference on its CPUs, and Vitis AI for things likeĀ Xilinx Kria KV260 FPGA-based Video AI Development Kit.

AMD ROCm ZenDNN Vitis AI Solutions
AMD ROCm ZenDNN Vitis AI Solutions

ROCm is really the key to get the MI300 to stick with customers since that is the primary stack for the MI300X and MI300A. Most users may interact with higher-level frameworks, but those frameworks need to work (well) with the hardware. NVIDIA spends a ton of money on CUDA.

AMD ROCm Software
AMD ROCm Software

We now have ROCm 6. This is the next-generation and more AI-focused ROCm. It is easy to forget at an AI event, that ROCm was launched with a focus on HPC.

AMD ROCm 6
AMD ROCm 6

AMD is touting ROCm 6’s optimizations for AI applications.

AMD ROCm LLM Optimzations 1
AMD ROCm LLM Optimzations 1

One part of the equation on using GPUs for AI is getting them to just work. We have heard this is mostly fixed at this point. The next is having competitive hardware. AMD is certainly competitive in memory and compute. Finally, software is often where the big gains happen. NVIDIA gets a lot more performance out of its hardware over time through optimizations.

AMD ROCm LLM Performance Optimizations
AMD ROCm LLM Performance Optimizations

AMD talked about the hardware plus software gains and says it can see 8x performance gains.

AMD ROCm And Hardware Generational Improvements
AMD ROCm And Hardware Generational Improvements

Here is another AMD versus NVIDIA single GPU inference.

AMD ROCm Llama 2 Performance
AMD ROCm Llama 2 Performance

AMD purchased Nod and Mipsology to help its software stack.

AMD ROCm More Software
AMD ROCm More Software

It also is touting that it has integration with popular frameworks.

AMD ROCm Developer Ecosystem
AMD ROCm Developer Ecosystem

The basic message on the software side is that AMD works. The open question is about scaling and optimizations over time.

Next, let us at least show the end notes. Then on the next page wrap this up.

AMD End Notes

We also wanted to publish the end notes for the presentation since there are a lot of performance claims.

AMD Instinct MI300 Launch_Page_46
AMD Instinct MI300 Launch_Page_46

Here is another page.

AMD Instinct MI300 Launch_Page_47
AMD Instinct MI300 Launch_Page_47

Lots of small text here.

AMD Instinct MI300 Launch_Page_48
AMD Instinct MI300 Launch_Page_48

Here is another set of endnotes.

AMD Instinct MI300 Launch_Page_49
AMD Instinct MI300 Launch_Page_49

Here is yet another.

AMD Instinct MI300 Launch_Page_50
AMD Instinct MI300 Launch_Page_50

That may not be the most exciting, but we at least wanted to have these documented.

Next, let us get to our impact discussion.

7 COMMENTS

  1. It’s a good question which of the MI300A or MI300X is going to be more popular. As a GPU could the MI300X be paired with Intel or even IBM Power CPUs?

    I personally find the APU more interesting. Not because the design is new so much as the fact that real problems are often solved using a mixture of algorithms some of which work well on GPUs and others better suited to CPUs.

  2. I hope to see some uniprocessor MI300A systems hit the market. As of today only quad and octo.
    Maybe a sort of cube form factor, PSU on the bottom, then mobo and gigantic cooler on the top. A SOC compute monster.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.