It looks like the xAI Colossus team has received what appears to be a Dell NVIDIA GB200 system. Based on some reflections, it looks like a NVIDIA GB200 NVL72 platform. Uday Ruddarraju at xAI posted a picture on X with dual-tray compute nodes and NVLink switch trays today.
Christmas Came Early at xAI Colossus NVIDIA GB200 Shown
Here is the photo shared on X:
Christmas arrived early at @xai‘s Colossus! pic.twitter.com/OC6xf4ZGX4
— Uday Ruddarraju (@rudaykumarraju) December 18, 2024
There are a few obvious ones here. First, the compute nodes are not hooked up yet with networking as we can see the pluggable optics are not installed and we do not see fiber installed. It does look like the low-speed management networks are hooked up. The second item we can see by the tray and bezel design is that this appears to be a Dell GB200 NVL system. It does not have the layout of the NVL4, but is it more likely a NVL72 system like the Dell PowerEdge XE9712. We can also count at least 7-8 NVLink switch trays if we look at what is in the photo, plus the reflection off of the xAI Christmas ball with the photo being taken from a knee. Best guess is that this is a Dell GB200 NVL72 system.
This is a big deal for xAI as Michael Dell had previously shown the Dell side of NVIDIA HGX hopper systems as being air-cooled. NVIDIA’s GB200 NVL72 design needs to be liquid cooled so it would signal the transition to liquid cooling for Dell at xAI. The bigger implication is that xAI is starting to get GB200 systems which is a big deal. Given this is Dell, this is unlikely a GH200 Oberon system as we discussed in our Substack in September.
We can also see NVIDIA Bluefield-3’s installed in the nodes, so it appears as though xAI is continuing to use NVIDIA NICs.
If you want to see the Supermicro side of the Memphis-based xAI Colossus, you can see our Inside the 100K GPU xAI Colossus Cluster that Supermicro Helped Build for Elon Musk. Here is the video for that one.
In that video we show how xAI was already depoying high-power racks earlier this year with 64x NVIDIA H100 GPUs per rack.
Final Words
The fact that xAI is getting GB200 supply is huge. Blackwell supply is starting to come online. Having a company like xAI that is operating at a much higher operational tempo than others in the industry, means those Blackwell GPUs are going to make an impact sooner rather than later. For Dell, it is great to see an evolved offering being deployed.
It is also a big win for Arm since that would be a transition from x86 to Arm compute as the CPUs. Arm also likely is used in the networking for BlueField-3 DPUs. From what we have heard, there is or was an Oberon-style system with x86, but given the timing, this is most likely Arm-based.
Again, as someone who has seen the xAI team in action, and the first phase being built-out, if you are a STH reader and want to join one of the A-teams in the industry, the xAI folks are doing monumental work here.
Other analysts “what is this?”
Patrick “based on the reflection you can see…..”
Superior take
Been a while since I saw this density of dick-riding outside of r/teslalounge. “Monumental work” “A-team” “higher operational tempo” … all used to describe an organization that has literally never accomplished anything.
Yes, ever since the cyberstuck arrived, Patrick has fallen in love with Elon…
Lmao at these people in the comments unable to compartmentalize tech from politics
I don’t get the Elon hate? Boeing, Ford, Iridium investors? This xAI team has built the biggest AI cluster the fastest and keeps building. Grok is good at some things already and they’re just getting their GPUs online. It took something like two months to train Llama3 so if they finished in August, best case if they did a huge training run first, which you wouldn’t do before at least trying some smaller step ones, you’d have finished training two months ago. They’ve got plenty of data, plenty of GPUs. They’ll get there.
There’s also AI deniers who can’t see AI beyond ChatGPT 4’s release and think all AI is chatbots.
What is this system going to be used for though – training self driving car AIs, modelling SpaceX vehicles, training LLMs, generating “content” for Twitter? It is a very impressive machine but it’s not clear what he wants to do with it.
“””There’s also AI deniers who can’t see AI beyond ChatGPT 4’s release and think all AI is chatbots.””” Sure we’re not just realists ? Automatic subtitles are fantastic and if you have accurate big datasets like in scientific research ANN is fantastic already. But AI created art sucks and none of the LLAMA’s reason in any useful way … I want robots that can fold my laundry without 2000$ inference costs going to Grok, I don’t want a “copilot” that squirts out syntaxically perfectly correct but bad and slow code that is hard to detect in code-reviews, etc … I don’t want most of our jobs cut without the technology being mature enough and we having other job opportunities ! I also mostly don’t like Elon and I like “Stranger in a Strange Land” (although I’m not a fan of Heinlein in general) so the word Grok … rubs me the wrong way. Also I worked IT support for an engineering dept. and I have seen how smart great engineers can be and how insanely genius the exceptional ones are (the on in 200/300 type students). And in comparison with them I don’t get the cult of Elon. I can’t pretend like he’s Einstein and Tony Stark molded in one. He’s not nearly as smart and capable as any one of those two nor any of the capable engineers I’ve met, let alone the exceptional ones. But he does manage to attract a lot of VC and wants to change things that are insanely hard to do. Which I applaud him for. I only think it’s sad that we need a Tech Populist now to try get funded to do, with a couple of orders of magnitude more money, that what the 60-70s engineers and astronauts on the Apollo missions accomplished already all these years ago.