Building the 8x NVIDIA GB10 Cluster Key Setup Steps
Perhaps the coolest thing about using a system like this is that it uses NVIDIA’s NCCL and other libraries. On the one hand, that makes everything straightforward to set up. On the other hand, the level of complexity of setting this up is well beyond operating a single GPU, or a few GPUs, in a single system.

Here, we are just going to have some of the key steps you need to do:
- Physically connect all of the systems and switches.
- Ensure that you are using the same ConnectX-7 ports on the back of each GB10 as that will make your life much easier when it comes to managing the networking.
- If you are not using WiFi, turn those radios off
- Document all of the connections
- Update all of the firmware across the cluster.
- Firmware changes can change performance significantly, especially if one or more nodes are on a different firmware.
- Once you are running the cluster, updating a node or nodes and rebooting will cause model downtime
- On the MikroTik CRS812 DDQ, set up MTU, PFC, ECN, and all of the QoS bits needed. This still requires going into the CLI
- Ensure that the ConnectX-7 NICs on each node are set up and can use RDMA and NCCL and also have the correct MTU to match the MikroTik switch
- Speed test the RDMA networking.
- If you are getting ~10Gbps that is likely because a node is going out over the 10GbE network.
- Make sure to do bi-directional testing
- Ensure the 10Gbase-T NICs are handling management
- Set up shared NAS storage.
- You can also cluster storage onboard each node, but given model sizes, it will be easier to just use a NAS
- Make sure each node tests access to the NAS and shared model storage directories
- Ensure you are getting the expected speed for the NAS
- Get vLLM working (or another serving platform that you can cluster) on all of the nodes
- Containers can make this a lot easier
- Make sure each node is running the same vLLM version
- Test connectivity for vLLM between nodes, and ensure it is using the ConnectX-7 path, not the 10Gbase-T path
- Smoke test running the model
- Set up some kind of monitoring solution. We will have more on this in the next section.
- Document everything as you will need it later.
Those are the high-level steps to get a cluster like this working. Many of our readers are going to look at that punch list and think, “That is a Saturday job, no problem.” Others are going to go into panic mode because there is a lot there and it crosses domains like servers, networking, storage, and AI platforms. Here is the fun bit: in 2026, you do not need to know any of that.

Instead, you can take the list above, give Claude Code, Codex, or even OpenClaw/ Hermes using a decent model behind it, and just have an AI agent do it for you. I know, giving 1TB of memory, 160 Arm cores, several TB of storage, and eight Blackwell GPUs to an AI agent sounds a bit like how Skynet started. That is fair, but it worked. Given the pace of AI evolution, this was possible in February 2026, and would be much easier now that there are more models that are better with tool calling.
The last part is the monitoring, so let us get to that next.



This is f*n awesome. Good on ya bro
BEST piece you’ve done recently. Wow. It makes me only want a 4N not 8N
This is indeed one of your best articles in years. My 2-node cluster will keep me entertained for a long time. Maybe one day I will go up to 4 nodes, we shall see.
The MikroTik CRS804 is perfect for a 4-node cluster, 4x200G for the RoCE, 4x100G for storage, and the last 400G port facing the NAS.
Have you tried implementing TurboQuant on the cluster? I am curious to see how long context windows impact the available vram over time and how TurboQuant might help.
Great article especially showing how you leveraged ai to set everything up!
Repost from the STH Forums, not sure which spot is better for responses:
Fantastic article. I’m running 4N right now, was/am still considering a drop to 2N since the extra 2N arent necessary for all models but your point about the fungibility of having more/spare nodes is excellent.
Very interesting to see the callout on using a flash NAS for shared storage – would love to hear more on best practices for this, especially the situations where a little more GPU in the NAS makes sense. Next click stop for my cluster is the addition of a flash NAS based on the ARR 1U E1S you guys reviewed a while back, but I hadn’t even considered putting a GPU into it. Would love more details on this.
Also if you’re taking requests a network diagram would be fantastic. I’m running 2x 804DDQs to handle the 4N + flash + uplinks + mgmt bit I’m nowhere near settled on the topology, would love to see how you ended up structuring the full cluster network including NAS & DDQ trunks / uplink.
Lastly, WTH was Ubiquiti 10G switch you were showing?! It definitely wasn’t the Pro XG-8 PoE (no screen) and AFAIK there’s no other UI 10G switches that aren’t rack mounted. I’m pretty familiar with the UI lineup and the only unit I know that looks like what you showed was the Enterprise 8 PoE (Vintage) model!
What tool or service did you use to get the stats for all the power and servers
I thought your SM Xeon 6 SOC review was unreal good. This is even better. I think I’m more sold on a 4N not and 8N but the M3 Ultra’s prefill is doggy doo doo. I’m lovin’ your new articles Patrick.
One that I wish you’d done is the QNAP 100G and 25G switch. If you’re only getting 140G then maybe it’d be better to just do 100G
I don’t get the tube comments where they’re so dead set on being the permanent underclass.
Use basic punctuation in your titles. I had a stroke trying to read it.
@El Porto & Peter – Patrick touched on it briefly in the video, but the general idea is the GB10 Connectx7 interfaces (and the CRS804 switch) are purely for inter-node coordination as they are running RoCE.
NAS and management traffic are all on the 10G NICs.
Deviating from that pattern would likely degrade model performance.
Regarding cluster size, I’ve been pretty happy with a 2 node with a simple DAC. An alternative (cheaper) way to scale out could simply be to add a totally separate 2n cluster, and load balance the LLM API requests.
I read about 8. Now I’m ordering a 4-node cluster.
Look at high-capacity DIMM pricing. If you get the 1TB you’re paying for the 128GB, maybe the SSD, but then the CPU, GPU, and NIC are free. These aren’t getting any cheaper. You aren’t going to get a better deal on this much VRAM.
Apple stopped selling the studio 512 because the spot pricing of the memory alone is approaching $15k.
Awesome writeup. STH is crushing it!
Has anyone been able to PXE boot their DGX spark? I want to provision it with MAAS but it always hangs. Is it possible to PXE from the connectx instead of the realtek?
Hi!
“On the MikroTik CRS812 DDQ, set up MTU, PFC, ECN, and all of the QoS bits needed.”
What is your RouterOS version? AFAIK PFC and ECN are not yet supported in v7.21.4 (long term) or v7.22.2 (stable), and are mandatory (?) functions for lossless RDMA connections.
Are you using v7.23b/rc?
Thank you
Amazing work!!