Our NVIDIA Spark arrived less than a week ago, but if you thought it looked cool during NVIDIA GTC 2025, it might actually be better, albeit with a few teething challenges. If you want a high-memory, NVIDIA-based mini AI workstation, this is it. Let me go a step further. This is also going to be a must-have (maybe more than one) for many AI developers. I also think it is a tool that is getting dangerously close to something I would recommend to almost every executive tasked with bringing AI into their organization. That is a pretty bold statement, and is something that we will get into in a lot more detail in a future piece, but this little box is one that we keep looking at and thinking, “this is so cool.”
NVIDIA sent us a pre-production box to do this piece. I will note that I pre-ordered/ reserved two of these immediately with the copper DAC bundle when I first saw it, and I still have not received the e-mail to purchase those, so this is what we have. You may have seen yesterday that we have the Dell Pro Max with GB10 as well. We were not allowed to show that system with this one turned on for this review, because it has a different embargo date.
NVIDIA DGX Spark Hardware Overview
Something that has to not just be seen, but be felt to really believe is the size of the Spark. It is 150mm x 150mm x 50.5mm and just looks cool. When Sam was done taking photos he walked over and said to me “this is the COOLEST mini PC.” Just to give you some sense, we were in the middle of filming three AMD Strix Halo PCs, two GB10 systems including this one, an Intel-based system with a PCIe dock, and more. I have to say that I echo his sentiment.

The front has what looks like foam, but it actually is hard and allows airflow.

The bottom has a big vent and a large rubber pad. These systems sit solidly on desks.

On the sides and top, the system is just flat.

They are gold colored sides and metal, but that is about all we can say.

The rear is where the action really happens. This has everything from the power button to the I/O ports.

On the left rear we have the power button, then a USB Type-C port for the USB PD input. We then have three USB ports that are USB 3 20Gbps Type-C with DisplayPort alt mode. The next port is a HDMI port. It might be worth seeing our teething section for some of the display caveats.

On the networking side, we get a Realtek-based 10GbE port. Luckily the driver comes with DGX OS so we did not have to install it. The big feature, is the NVIDIA ConnectX-7 NICs.

We will get into these soon, but these are 200GbE QSFP56 ports which means that that they are running four channels of 56G/ 50Gbps PAM4. We went into this in our QSFP Versus QSFP-DD Here Are the Key Differences but physical connectivity gets to be a challenge at higher speeds. These ports are central to the value proposition of the DGX Spark and GB10 systems in general. Since they are high-speed and support RDMA networking, the idea is that you can take a copper DAC and connect two together to get even more compute and memory.
Next, let us get into how this system works with the GB10, including some of the features of those 200GbE ports.




It’s a flimsy mini-pc with a somewhat decent gpu, a lot of vram and 200GB networking which when you want local ai for development is pretty good. It’s a lot cheaper than buying an AI server
My old 2023 M2 Max runs gpt-oss-120b. You quote 14.5 toks/s. Not a serious benchmark, but I typed what LM studio thought was 104 tokens (told me 170 after) and I got 53.5 tok/sec back. Hmm … “Here’s a nickel kid, go and buy yourself a real computer!” I do appreciate your review though. Keep up the good work!
It is a nice system. Do you have any comparison against Strix Point Halo and other systems? Of course some systems are a little bit apples to oranges, but such data is helpful.
I get ~43 tokens/sec (293 prompt tok/s) with a fresh prompt on my Strix Halo computer with GPT-OSS-120B (MXFP4). Is that Spark running a different quantization or is there something else causing the bad performance (CUDA dropping features because it’s too recent, etc.)? On paper the Spark should have an edge because of the higher memory bandwidth.
@Jack, how are you able to run 120B on Strix halo. I thought Strix only had 32G memory.
@Paddy, how are you able to run 120b on M2 ? I dont think the RAM on M2 can hold that big model.
@Adam strix halo with the 128 GB shared memory, like the Framework Desktop and others. I believe they come in 32/64/128 GB RAM variants. There are different types, but I think the AMD AI Max+ 395 or whatever it’s called is the only interesting one.
I believe many of them have articles about them on this site.
The Strix Halo 128gb boxes are half the price. I understand Patrick’s enthusiasm about 200GbE networking but the “throw it in a suitcase for a demo” use case doesn’t make use of it. For clustering I would think you need network storage that can keep up, and I’m sure someone will do it and make a YT video of it for the novelty but I’m not sure the niche here over bigger server hardware is that wide.
So a lot of this value proposition seems to come down to if you really want CUDA or not, which admittedly already commands a considerable premium.
@Oarman:
“The Strix Halo 128gb boxes are half the price.” And a GMKTec NucBox can go for as little as $125. So what? This device beats Strix Halo by 3% in single core performance, 40% in multicore performance and – if the author’s NVIDIA GeForce RTX 5070 comparison is correct – at least 25% in graphics performance as the AMD Radeon 8060S in Strix Halo can’t even keep up with the GeForce RTX 4070 in most tasks. And of course, the networking performance is better.
Now it is up to you to decide whether substantially better CPU, graphics and networking performance is worth $2000. But it is more than just CUDA here. It isn’t as if you can buy 2 Corsair AI Workstation 300 boxes and hook them up to each other to get 100% – or even 25% – more performance.
@rano
If you look at the above, it only beat the power limited Strix Halo box in the CPU performance, it lost in the multi-core when the strix was provided with full power. In addition, the above says nothing about graphics performance, only noting that the INT4 performance was the equivalent of a 5070. The only notes above graphics was that it was flaky driving a standard HDMI monitor. As it is based on on Blackwell AI chips it may very well have a very nerfed graphics processor (they are, after all, not designed for graphics processing but instead AI testing).
The network is certainly superior and having CUDA is certainly nice but the gpt-oss performance is surprising poor.
@Ehren:
“it only beat the power limited Strix Halo box in the CPU performance”
Because it itself is power-limited. The DGX Spark is smaller than the GMKTeck Evo 2. Yes, there will soon be Strix Halo machines that aren’t limited by the mini-PC form factor but the same will be true for Nvidia GB10 devices down the line.
“the above says nothing about graphics performance, only noting that the INT4 performance was the equivalent of a 5070”
Except that is about graphics performance.
“it was flaky driving a standard HDMI monitor”
Because it is a preproduction pre-release model running Nvidia’s DGX version of Ubuntu.
“As it is based on on Blackwell AI chips it may very well have a very nerfed graphics processor (they are, after all, not designed for graphics processing but instead AI testing).”
There is no such thing as “Blackwell AI chips”. They are just Blackwell chips used for professional applications just like their previous Ada Lovelace and Grace chips. The Blackwell Pro 6000 advertises itself as a workstation or high end desktop GPU, not an “AI chip.” Of course, this is nowhere near as powerful as a Blackwell Pro 6000, but the AMD Radeon 8060S is even further from an AMD Radeon Pro workstation/server GPU. (That being said, AMD’s 2026 integrated GPU is going to be way better, almost certainly good enough to match this one.)
https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/
Both AMD and Apple fans are coming out of the woodwork to try to cut this FIRST GENERATION PRODUCT down a peg when both have been releasing their own productivity small form factor devices for years (except the Mac Mini isn’t that small). Hilarious.
There are definitely not for AI developers as in people working on AI. They seem excellent at being small, easy to setup edge devices running private models in typical edge locations.
I doubt executives will buy these and put them in their C-suites when trying out local models. At half the price of an Blackwell Pro 6000 I also doubt that clustering them outside of said edge locations will be viable. And for the ambitious homelabber clustering won’t play a major role which means back to Strix Halo machines at half the price.
These will be neat to play with once they hit the secondhand market at reasonable prices (I would probably pay up to $500 for one of these). Investors and corps will be left holding the bag after buying into the ‘AI is a magic box that replaces human knowledge workers’ delusion.
The llama.cpp github suggests that the poor performance is due to powersave settings. I’m not sure if there’s a module parameter for that or if it requires code updates, but there seems to be a way to make the performance match the specs at least.
I reserved a Spark Duo early in the Reservation window, and was notified last week that I would soon receive an email letting me know when I could finalise my order; the expectation being that I would receive my product ahead of general release.
15 Oct came (and went) with no notification.
So, I decided to just grab one from MicroCenter (I can always get another down the line). Placed my order before Sunrise, and purchased it later this morning.
It’s still in the box, as I have other priorities to attend-to.
Anyone want a late, early release Reserve for a Duo (if I ever receive it, that is)?
does anyone know why they keep on mentioning only 200Gbps throughput total for what appears to be 2 qsfp112 ports which should be capable of 400Gbps total. One way to check is to look at the Lnksta and see if the pci design is limited to x8. If it shows 32GT/s and x16 for each port, there might be a better chance at doing 400Gbps with both ports connected. The IC itself could still be limiting or maybe just a fw limitation.
the docs state that the spark supports multi-rail for the cx7 nic ports. So you should at least be able to connect both ports in a load balancing cfg.
Nice network testing STH.
So, based on the recent network tests, it can only reach maximum throughput of 100Gbps across both QSFP ports? That is strange since Nvidia is claiming it’s 200GbE NIC.
> This is because unlike an AMD Strix Halo system, we have a unified memory system here so we do not have to do a 32GB CPU, 96GB GPU split. Instead, we have a big 128GB pool.
No! Strix Halo like all AMD APU are unified memory. The 32/96 split is only for windows 3D games. On Linux I have no problem to get >126G of RAM and use all of it on GPU with HIP.