NVIDIA DGX Spark Power Consumption
The power adapter that comes with the unit is a 240W USB-PD adapter.

At idle, when we did our power measurements last week, this system was idling in the 40-45W range. Just loading the CPU, we could get 120-130W. Adding the GPU, and other components we could get to just under 200W, but we did not get to 240W. Something to keep in mind is that QSFP56 optics can use a decent amount of power.
Also, in many of the AI inference workloads with LLMs we were using 60-90W and the system was very quiet. There is a fan running, but if you are 1-1.5m away it is very difficult to hear and it never hit 40dba when we were not stress testing the system.
Key Lessons Learned
When we first saw these systems, I immediately thought: “This is for AI developers.” In a way, due to the limited memory bandwidth, that may be the case since big GPUs perform better. On the other hand, I actually think the answer is going to be much broader.

So far, I have done 86 flights in 2025. We have a Dell Pro Max 18 Plus notebook with a giant NVIDIA RTX Pro 5000 Blackwell mobile GPU that made it into the video. That is an awesome AI machine, but it is big and heavy and not practical if you work from planes as much as I do. Instead, if you have a 14″ notebook or a thin and light notebook and want to have an AI workstation, these are small enough you can put them into your carry-on bag.

Aside from the portability, the 128GB of LPDDR5X memory and the ability to scale out to a cluster of systems using 200GbE RDMA networking means that one can use high-end models with this system (see the network section for a note on that connectivity), even if they are not running very fast. That may seem silly at first, but if you are an executive and want to try local AI on your data that you cannot send to a cloud provider, and you also want to be confident in the accuracy of the LLM your are using, then using a higher parameter count model is important. With 128GB, we can use those larger local models. What is more, all of the executives that could not run larger, more accurate LLMs locally on thin and light laptops, can now do so with easy-to-use small systems.

There is, of course, a part of me that wants more than 273GB/s of memory bandwidth in a device like this. It makes the device more of a prototyping machine versus something like the NVIDIA RTX Pro 6000 Blackwell Edition. Make no mistake, the RTX Pro 6000 Blackwell Edition scaled out is much faster, but scaling these systems is really neat as well. There are going to be many folks who just want more performance because they are accustomed to that. This is designed for prototyping that you can throw in your carry-on suitcase or backpack, or even two of them in the bag, not maximum performance.
One bit of significance is that this is “cool”. The other is that the NVIDIA DGX Spark will democratize being able to run large local models.
Final Words
We have only had this system for a few days, and let us be clear, I do not think this is the most mature NVIDIA platform out there in its pre-release or just released state. It is, however, impossible for me to get past the feeling that this is a game-changer. We have 128GB of unified memory to run large models. We do not need another type of GPU or AI accelerator, this uses NVIDIA Blackwell. Instead of having to hack together a low-cost Thunderbolt network or something of that nature, you can just use a very high-end 200GbE RDMA NIC (again see the note on the NIC connectivity.) On the CPU side, this is also the best, albeit not cheap, Arm desktop/ mini PC on the market by a long-shot.

Even though we have more than one GB10 in the studio right now, if I had the opportunity to buy another, I would do it immediately. Sam was right that these are the “coolest mini PCs.” Beyond just being “cool” I also think they are a game-changer for local AI development. I would love it if they had double the memory bandwidth, but even as-is, they provide a new capability that did not exist with PCIe GPUs.
I think the NVIDIA DGX Spark will be another NVIDIA product with very high demand so I predict is going to be a challenge just to get one in the coming weeks.



It’s a flimsy mini-pc with a somewhat decent gpu, a lot of vram and 200GB networking which when you want local ai for development is pretty good. It’s a lot cheaper than buying an AI server
My old 2023 M2 Max runs gpt-oss-120b. You quote 14.5 toks/s. Not a serious benchmark, but I typed what LM studio thought was 104 tokens (told me 170 after) and I got 53.5 tok/sec back. Hmm … “Here’s a nickel kid, go and buy yourself a real computer!” I do appreciate your review though. Keep up the good work!
It is a nice system. Do you have any comparison against Strix Point Halo and other systems? Of course some systems are a little bit apples to oranges, but such data is helpful.
I get ~43 tokens/sec (293 prompt tok/s) with a fresh prompt on my Strix Halo computer with GPT-OSS-120B (MXFP4). Is that Spark running a different quantization or is there something else causing the bad performance (CUDA dropping features because it’s too recent, etc.)? On paper the Spark should have an edge because of the higher memory bandwidth.
@Jack, how are you able to run 120B on Strix halo. I thought Strix only had 32G memory.
@Paddy, how are you able to run 120b on M2 ? I dont think the RAM on M2 can hold that big model.
@Adam strix halo with the 128 GB shared memory, like the Framework Desktop and others. I believe they come in 32/64/128 GB RAM variants. There are different types, but I think the AMD AI Max+ 395 or whatever it’s called is the only interesting one.
I believe many of them have articles about them on this site.
The Strix Halo 128gb boxes are half the price. I understand Patrick’s enthusiasm about 200GbE networking but the “throw it in a suitcase for a demo” use case doesn’t make use of it. For clustering I would think you need network storage that can keep up, and I’m sure someone will do it and make a YT video of it for the novelty but I’m not sure the niche here over bigger server hardware is that wide.
So a lot of this value proposition seems to come down to if you really want CUDA or not, which admittedly already commands a considerable premium.
@Oarman:
“The Strix Halo 128gb boxes are half the price.” And a GMKTec NucBox can go for as little as $125. So what? This device beats Strix Halo by 3% in single core performance, 40% in multicore performance and – if the author’s NVIDIA GeForce RTX 5070 comparison is correct – at least 25% in graphics performance as the AMD Radeon 8060S in Strix Halo can’t even keep up with the GeForce RTX 4070 in most tasks. And of course, the networking performance is better.
Now it is up to you to decide whether substantially better CPU, graphics and networking performance is worth $2000. But it is more than just CUDA here. It isn’t as if you can buy 2 Corsair AI Workstation 300 boxes and hook them up to each other to get 100% – or even 25% – more performance.
@rano
If you look at the above, it only beat the power limited Strix Halo box in the CPU performance, it lost in the multi-core when the strix was provided with full power. In addition, the above says nothing about graphics performance, only noting that the INT4 performance was the equivalent of a 5070. The only notes above graphics was that it was flaky driving a standard HDMI monitor. As it is based on on Blackwell AI chips it may very well have a very nerfed graphics processor (they are, after all, not designed for graphics processing but instead AI testing).
The network is certainly superior and having CUDA is certainly nice but the gpt-oss performance is surprising poor.
@Ehren:
“it only beat the power limited Strix Halo box in the CPU performance”
Because it itself is power-limited. The DGX Spark is smaller than the GMKTeck Evo 2. Yes, there will soon be Strix Halo machines that aren’t limited by the mini-PC form factor but the same will be true for Nvidia GB10 devices down the line.
“the above says nothing about graphics performance, only noting that the INT4 performance was the equivalent of a 5070”
Except that is about graphics performance.
“it was flaky driving a standard HDMI monitor”
Because it is a preproduction pre-release model running Nvidia’s DGX version of Ubuntu.
“As it is based on on Blackwell AI chips it may very well have a very nerfed graphics processor (they are, after all, not designed for graphics processing but instead AI testing).”
There is no such thing as “Blackwell AI chips”. They are just Blackwell chips used for professional applications just like their previous Ada Lovelace and Grace chips. The Blackwell Pro 6000 advertises itself as a workstation or high end desktop GPU, not an “AI chip.” Of course, this is nowhere near as powerful as a Blackwell Pro 6000, but the AMD Radeon 8060S is even further from an AMD Radeon Pro workstation/server GPU. (That being said, AMD’s 2026 integrated GPU is going to be way better, almost certainly good enough to match this one.)
https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/
Both AMD and Apple fans are coming out of the woodwork to try to cut this FIRST GENERATION PRODUCT down a peg when both have been releasing their own productivity small form factor devices for years (except the Mac Mini isn’t that small). Hilarious.
There are definitely not for AI developers as in people working on AI. They seem excellent at being small, easy to setup edge devices running private models in typical edge locations.
I doubt executives will buy these and put them in their C-suites when trying out local models. At half the price of an Blackwell Pro 6000 I also doubt that clustering them outside of said edge locations will be viable. And for the ambitious homelabber clustering won’t play a major role which means back to Strix Halo machines at half the price.
These will be neat to play with once they hit the secondhand market at reasonable prices (I would probably pay up to $500 for one of these). Investors and corps will be left holding the bag after buying into the ‘AI is a magic box that replaces human knowledge workers’ delusion.
The llama.cpp github suggests that the poor performance is due to powersave settings. I’m not sure if there’s a module parameter for that or if it requires code updates, but there seems to be a way to make the performance match the specs at least.
I reserved a Spark Duo early in the Reservation window, and was notified last week that I would soon receive an email letting me know when I could finalise my order; the expectation being that I would receive my product ahead of general release.
15 Oct came (and went) with no notification.
So, I decided to just grab one from MicroCenter (I can always get another down the line). Placed my order before Sunrise, and purchased it later this morning.
It’s still in the box, as I have other priorities to attend-to.
Anyone want a late, early release Reserve for a Duo (if I ever receive it, that is)?
does anyone know why they keep on mentioning only 200Gbps throughput total for what appears to be 2 qsfp112 ports which should be capable of 400Gbps total. One way to check is to look at the Lnksta and see if the pci design is limited to x8. If it shows 32GT/s and x16 for each port, there might be a better chance at doing 400Gbps with both ports connected. The IC itself could still be limiting or maybe just a fw limitation.
the docs state that the spark supports multi-rail for the cx7 nic ports. So you should at least be able to connect both ports in a load balancing cfg.
Nice network testing STH.
So, based on the recent network tests, it can only reach maximum throughput of 100Gbps across both QSFP ports? That is strange since Nvidia is claiming it’s 200GbE NIC.
> This is because unlike an AMD Strix Halo system, we have a unified memory system here so we do not have to do a 32GB CPU, 96GB GPU split. Instead, we have a big 128GB pool.
No! Strix Halo like all AMD APU are unified memory. The 32/96 split is only for windows 3D games. On Linux I have no problem to get >126G of RAM and use all of it on GPU with HIP.