Advertisement


Home AI BIG AI Cluster Little Power the 8x NVIDIA GB10 Cluster

BIG AI Cluster Little Power the 8x NVIDIA GB10 Cluster

4

Building the 8x NVIDIA GB10 Cluster Key Setup Steps

Perhaps the coolest thing about using a system like this is that it uses NVIDIA’s NCCL and other libraries. On the one hand, that makes everything straightforward to set up. On the other hand, the level of complexity of setting this up is well beyond operating a single GPU, or a few GPUs, in a single system.

Gigabyte AI TOP ATOM NVIDIA GB10 Front 1
Gigabyte AI TOP ATOM NVIDIA GB10 Front 1

Here, we are just going to have some of the key steps you need to do:

  • Physically connect all of the systems and switches.
    • Ensure that you are using the same ConnectX-7 ports on the back of each GB10 as that will make your life much easier when it comes to managing the networking.
    • If you are not using WiFi, turn those radios off
    • Document all of the connections
  • Update all of the firmware across the cluster.
    • Firmware changes can change performance significantly, especially if one or more nodes are on a different firmware.
    • Once you are running the cluster, updating a node or nodes and rebooting will cause model downtime
  • On the MikroTik CRS812 DDQ, set up MTU, PFC, ECN, and all of the QoS bits needed. This still requires going into the CLI
  • Ensure that the ConnectX-7 NICs on each node are set up and can use RDMA and NCCL and also have the correct MTU to match the MikroTik switch
  • Speed test the RDMA networking.
    • If you are getting ~10Gbps that is likely because a node is going out over the 10GbE network.
    • Make sure to do bi-directional testing
  • Ensure the 10Gbase-T NICs are handling management
  • Set up shared NAS storage.
    • You can also cluster storage onboard each node, but given model sizes, it will be easier to just use a NAS
    • Make sure each node tests access to the NAS and shared model storage directories
    • Ensure you are getting the expected speed for the NAS
  • Get vLLM working (or another serving platform that you can cluster) on all of the nodes
    • Containers can make this a lot easier
    • Make sure each node is running the same vLLM version
    • Test connectivity for vLLM between nodes, and ensure it is using the ConnectX-7 path, not the 10Gbase-T path
    • Smoke test running the model
  • Set up some kind of monitoring solution. We will have more on this in the next section.
  • Document everything as you will need it later.

Those are the high-level steps to get a cluster like this working. Many of our readers are going to look at that punch list and think, “That is a Saturday job, no problem.” Others are going to go into panic mode because there is a lot there and it crosses domains like servers, networking, storage, and AI platforms. Here is the fun bit: in 2026, you do not need to know any of that.

Lenovo PGX NVIDIA GB10 Rear 1
Lenovo PGX NVIDIA GB10 Rear 1

Instead, you can take the list above, give Claude Code, Codex, or even OpenClaw/ Hermes using a decent model behind it, and just have an AI agent do it for you. I know, giving 1TB of memory, 160 Arm cores, several TB of storage, and eight Blackwell GPUs to an AI agent sounds a bit like how Skynet started. That is fair, but it worked. Given the pace of AI evolution, this was possible in February 2026, and would be much easier now that there are more models that are better with tool calling.

The last part is the monitoring, so let us get to that next.

4 COMMENTS

  1. This is indeed one of your best articles in years. My 2-node cluster will keep me entertained for a long time. Maybe one day I will go up to 4 nodes, we shall see.
    The MikroTik CRS804 is perfect for a 4-node cluster, 4x200G for the RoCE, 4x100G for storage, and the last 400G port facing the NAS.

  2. Have you tried implementing TurboQuant on the cluster? I am curious to see how long context windows impact the available vram over time and how TurboQuant might help.

    Great article especially showing how you leveraged ai to set everything up!

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.