Advertisement


Home AI ASUS XA NB3I-E12 Review A Massive 8x NVIDIA B300 GPU Server

ASUS XA NB3I-E12 Review A Massive 8x NVIDIA B300 GPU Server

0

ASUS XA NB3I-E12 Performance

First off, we wanted to look at the Intel Xeon 6740P performance.

Intel Xeon 6740P Lscpu Output Large
Intel Xeon 6740P Lscpu Output Large

These are interesting since they are 48-core CPUs. Oftentimes, we see 8x GPU systems with 64 or more cores, but some configure lower-core-count CPUs to achieve higher performance per core.

ASUS XA NB3I E12 CPU Tray 12 CPU Heatsink And DDR5 Angle
ASUS XA NB3I E12 CPU Tray 12 CPU Heatsink And DDR5 Angle

We used our AgentSTH V5 benchmark suite on this system. We tested it just before the Ubuntu 26.04 release, so we were not using the V7 suite yet. AgentSTH is a collaboration between STH and some folks at large hyperscalers to examine CPU performance for agentic AI workflows. We profiled what CPUs are actually doing during agentic workflows, and that helped build and weight the benchmark suite. Of course, for a system like this, you generally want the AI agents running on different systems, but we thought it would be more interesting here. If you want a good idea of traditional CPU performance, SPEC CPU 2026 is out.

Intel Xeon 6740P AgentSTH V5 Subtests_at_max
Intel Xeon 6740P AgentSTH V5 Subtests_at_max

Something we have been doing is testing the CPUs not just at their maximum core counts, but also using standardized CPU sizes ranging from a single core to 32 cores. Not all portions of the profiled agentic AI workflows scale linearly as more cores are added, so we often see drop-offs in terms of performance as the core counts increase.

Intel Xeon 6740P AgentSTH V5 Subtest_scaling
Intel Xeon 6740P AgentSTH V5 Subtest_scaling

Here is a fun one to look at. We ran a matrix ranging from a single instance across the entire system to sixteen smaller agent containers. You will notice that the 2x 48-core actually performs better than the 1x 96-core setup because of the socket-to-socket link.

Intel Xeon 6740P AgentSTH V5 Multi_agent_efficiency
Intel Xeon 6740P AgentSTH V5 Multi_agent_efficiency

Here you can see that, because not all workloads scale with increased cores, we get more aggregate CPU performance by using more containers with fewer cores per container. This makes logical sense, but it was still neat to see.

Intel Xeon 6740P AgentSTH V5 Multi_agent_aggregate
Intel Xeon 6740P AgentSTH V5 Multi_agent_aggregate

We have various subtest categories, such as throughput tests, coordination-style tasks, and those that are extremely memory-bandwidth bound. If you saw our Striking Back at AI Memory Pricing Using AI piece, this will look very familiar.

Intel Xeon 6740P AgentSTH V5 Dimensions_at_max
Intel Xeon 6740P AgentSTH V5 Dimensions_at_max

Here is just a composite score scaling by the core counts.

Intel Xeon 6740P AgentSTH V5 Composite_scaling
Intel Xeon 6740P AgentSTH V5 Composite_scaling

Just comparing the performance of this server to our reference Intel Xeon 6700 series platform across more traditional workloads, here is what we saw:

ASUS XA NB3I E12 Intel Xeon 6740P Performance
ASUS XA NB3I E12 Intel Xeon 6740P Performance

Overall, this shows that we are getting enough cooling to keep these processors running at their top speeds, even though we are in a GPU server.

Of course, the main attraction here is that we have a GPU compute server, so the NVIDIA Blackwell Ultra GPUs are the main attraction. We wanted to look at a big model, so we used Kimi K2.5.

ASUS XA NB3I E12 NVIDIA HGX B300 Nvidia Smi
ASUS XA NB3I E12 NVIDIA HGX B300 Nvidia Smi

While those are close, really, the big number to us was the Kimi K2.5 number. We ran it on the 8x GB10 cluster, but at just over 1 token/ second/ user. That was basically the buy-in to get this running.

With the massive NVIDIA Blackwell Ultra HBM3e memory pools, we were able to run it on only four GPUs, or better said, we could have vLLM running on two sets of B300 GPUs in the same system.

ASUS XA NB3I E12 Kimi K2.5 Performance Tokens Per Second By Concurrency
ASUS XA NB3I E12 Kimi K2.5 Performance Tokens Per Second By Concurrency

That might not seem like a lot, but it also really clearly shows why having faster memory is so important. As an important piece of context here, really, the 2x 4x GPU numbers were running two instances of K2.5, so each instance was getting single-user concurrency. Realistically, you can either double-up instances like we did, or run K2.5 and then run other models alongside it.

ASUS XA NB3I E12 Kimi K2.5 Performance Tokens Per Second Per User And Latency By Concurrency
ASUS XA NB3I E12 Kimi K2.5 Performance Tokens Per Second Per User And Latency By Concurrency

As a quick aside here, SGLang is faster at serving Kimi K2.5.

Next, let us discuss power consumption.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.