Exploring the NVIDIA HGX B200 Lambda AI Cluster at Cologix with Supermicro

7

VAST Storage on Supermicro Servers

VAST Data’s storage is extremely popular in AI clusters. Lambda had tens of Petabytes of clustered storage already online when we visited. The GPU compute is important, but feeding those GPU servers with data is also critical, which is why high-speed storage solutions like those from VAST are used.

Supermicro Lambda Cologix Misc 3
Supermicro Lambda Cologix VAST 3

VAST used to be more of a hardware company, but now we see them as more of a software company. As part of their solution, you can use different servers. Lambda has Supermicro servers here with 2.5″ NVMe drives for scaling out the storage.

Supermicro Lambda Cologix Misc 6
Supermicro Lambda Cologix Supermicro Storage under VAST 6

With 1-click clusters provisioning GPU clusters quickly, that is only part of the equation. A customer also needs data to move from wherever it is stored and into the data center with the GPUs. It may also need to save output data for future training and compliance. As a result, these arrays are often filled with high-capacity SSDs.

One other small note for those who do not work on these regularly. If you have NVIDIA B200’s in a facility like this, as Lambda does, large well-known AI shops will seek to lease capacity for multiple years as soon as they can get it. That is often done to increase the scale of inferencing operations. Seeing these arrays in Lambda’s cluster is unlikely to be an accident since those well-known AI shops leasing B200 clusters will have storage preferences as well.

Next, let us discuss cooling at the data center level.

7 COMMENTS

  1. Article good. Video better. I’m not sure I’ve seen a dc video as fast paced as your xAI video since that one. It was like I was watching some gripping mission impossible action movie not some boring dc video. I don’t know how you did that, but keep doing more of it

  2. Eagerly awaiting the day when the AI bubble bursts after investors figure out that AI isn’t a magic black box that replaces human employees. Then some of this hardware can hit the secondary market for prices that hobbyists are willing to pay for hobbyist use of AI.

  3. AI isn’t some investor-fueled bubble. Companies are actively spending massive amounts of money on it. If the companies are collectively spending hundreds of billions of dollars per year on AI just to satisfy perceived investor interest then there is a much bigger problem in the marketplace than an AI bubble. If it is a bubble it is in the hopes and expectations of technology companies, not in the speculation of investors.

  4. @Matt it’s become so bad that the likes of Microsoft have started ‘trimming the fat’ despite record revenues, and are desperately tacking AI (‘copilot’) onto any popular product despite customer pushback that it doesn’t really add any value let alone justify a price increase.

  5. Looking at Cologix’s locations it seems that the most northerly location is Montreal Canada.

    Checking the Open Canada website for “Permafrost, Atlas of Canada, 5th Edition” we see that there’s so many better locations available for cooling a server farm (at the lowest possible price). Just as you probably don’t want to setup in southern California (because of the temperature and electricity costs) you wouldn’t setup in southern Canada (when you can move to northern Canada, near a dam).

    Not my hundreds of millions ….

  6. Within a year or two, we’ll be looking back and wondering how absolutely *everyone* seemed to think this was a great idea.

    I’d love to see this equipment put to work doing scientific research but I fear that it’s already too tightly optimized for AI work.

    Regardless, fascinating view into how these clusters come together. Thanks, Patrick!

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.