Gigabyte G893-ZX1-AAX2 AMD Instinct MI325X Server Review

3

Gigabyte G893-ZX1-AAX2 Topology and Block Diagram

Installed in the system we had two 192 core AMD EPYC 9965 CPUs. That made for a massive system loaded with GPUs, storage, and NICs. This is one of the bigger topology maps we have ever seen.

Gigabyte MI325X Topology
Gigabyte MI325X Topology

For the system’s block diagram, perhaps most interesting part is how the PCIe lanes are distributed.

Gigabyte G893-ZX1-AAX2 Block Diagram
Gigabyte G893-ZX1-AAX2 Block Diagram

This system has six PCIe switches with the four Broadcom PEX89104 switches handling the CPU-GPU-NIC-NVMe topology. Then there are two Broadcom PEX89048 chips to handle the CPUs to the North-South network links. Those PEX89048’s are not directly connected to the GPUs and storage.

Gigabyte G893-ZX1-AAX2 Management

On the management side, we get an ASPEED AST2600 BMC running MegaRAC SP-X. This is an industry standard management interface.

Gigabyte G893-ZX1-AAX2-000 MegaRAC SP-X Dashboard
Gigabyte G893-ZX1-AAX2-000 MegaRAC SP-X Dashboard

On these systems, the number of components tends to be higher since we have so many fans, and even things like twelve power supplies. That means that the management interface has a lot more to cover in terms of monitoring and logging.

Gigabyte G893-ZX1-AAX2 Management Sensor Readings
Gigabyte G893-ZX1-AAX2 Management Sensor Readings

Still, we get standard features like an HTML5 iKVM. Here we can see our eight AMD Instinct MI325X GPUs idling at 122-137W using rocm-smi.

Gigabyte G893-ZX1-AAX2-000 HTML5 iKVM ROCm SMI
Gigabyte G893-ZX1-AAX2-000 HTML5 iKVM ROCm SMI

All of this will be familiar to those who have used SP-X on a Gigabyte server or other servers before.

Gigabyte G893-ZX1-AAX2 Performance

In terms of performance, we had dual 192 core AMD EPYC 9965 CPUs for a total of 384 cores and 768 threads. There are often a few schools of thought regarding CPUs for these systems. Some go with the highest core counts, which we have here. Others go for lower core count higher frequency parts.

AMD EPYC 9965 lscpu output
AMD EPYC 9965 lscpu output

Still, we wanted to validate that we were getting the performance we would expect from the CPUs so we compared the performance to our reference 2U server:

Gigabyte G893-ZX1-AAX2 AMD EPYC 9965 Performance to Baseline
Gigabyte G893-ZX1-AAX2 AMD EPYC 9965 Performance to Baseline

One of the big challenges in these big AI servers is keeping the CPUs cool. Gigabyte is doing a good job in this area.

AMD EPYC 9965 Gigabyte G893 ZX1 AAX2 Large
AMD EPYC 9965 Gigabyte G893 ZX1 AAX2 Large

On the GPU side, we have been testing a number of these systems. We used vllm for Llama 3.1 and sglang for Deepseek-R1. On the LLama 3.1 side we used 128 input tokens and 2048 output tokens since that tends to be a sweet spot we found. We also used a batch size of 8 for our latency comparisons. On the Deepseek-R1 we were using the same 128/2048 and at a concurrency of 64 with the mean inter-token latency as our latency comparsion point. Performance was on par with other systems.

Gigabyte G893-ZX1-AAX2 AMD Instinct MI325X Performance to Baseline
Gigabyte G893-ZX1-AAX2 AMD Instinct MI325X Performance to Baseline

Directionally, these were in-line with other systems with the AMD Instinct MI325X that we have tested. Something to keep in mind is that we tested these systems pre-ROCm 7. AMD is making big strides in its software stack, so performance using a modern stack should be faster than what we are getting.

Next, let us talk about power.

3 COMMENTS

  1. Is there any concern with regards to performance when using these pci switches?
    ex. Any potential for oversubscribing and causing congestion avoidance protocols to kick in?

    How does this topology compare to a typical dgx system regarding the issue above?
    Is there the same potential for oversubscription with similar Nvidia based servers?

  2. Kawaii, the unpopulated motherboard has different layout too. Most obvious is the m.2 slot adjacent to the middle bank of DIMMS on the unpopulated board.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.