Air Cooling an AI Cluster with Cologix
Cologix offers a number of data centers both in the Columbus, Ohio area, as well as elsewhere. It also has both liquid-cooled and air-cooled data centers. We just happened to take a look at a nice air-cooled location.

Likewise, Lambda has a number of clusters. This was just one that we could take a look at while it was both operating and also being expanded.

Most data centers and clusters I get to tour have something iconic about them. For me, the blue cooling walls are going to be the iconic part of this tour. The scale of these, combined with them being a contrasting blue, is just neat.

Behind the blue mesh are heat exchangers. Chillers mounted on the roof circulate fluid to what are effectively massive radiators like you would find in a car. The cooler room air in the cold aisles is sucked into the Supermicro GPU servers and through heatsinks where heat is removed from components and transferred into the air.

That warmed air is then contained in a hot aisle where it rises and is pulled around to the heat exchangers. My trips through the hot aisles were brief as they were both noisy and hot.

Those heat exchangers remove heat from the air, transferring the heat to the fluid loop. After that heat is removed, the air is then recirculated into the cold aisles of the data center.

There are a number of ways that folks exchange air in a data center. Technically, there is a liquid loop that is removing heat from the cluster and bringing it outside. In the industry, we do not call this liquid cooling AI clusters. Instead, this is air-cooling because we go from GPU/ CPU to air via heatsinks, then to the liquid loops for the data center.

We use liquid cooling, like we see on Lambda’s Supermicro GB200 NVL72 rack, to describe going from the GPU/ CPU to liquid cooling blocks.

Cooling is fun, but you are probably wondering about power. Let us get to that next.
Powering the Cluster at Cologix
The Cologix data center has its own power substation.

While this is not part of the AI cluster tour, some folks have never seen what this looks like.

We did not tour inside the fenced area for safety reasons. This is a 36MW facility.

For some sense of scale, here is the row of power containers with things like battery banks outside of the facility. Each of these is rated for something like 1.6MW. If you squint, you might be able to see me walking through this corridor between the data center and these pods.

That power is then brought inside the facility and is distributed via busbars/ busways. We looked at busbars/ busways last year in another video.

An advantage of using this type of setup is that one can use tap-off boxes to bring power of the right type to a rack via a movable overhead box.

You can see examples of this with different tap-off boxes being used for different racks, depending on the type of rack being provisioned.

This may seem like a small feature, but if an upgrade happens in the future and racks move or need different types of power, these tap-off boxes take a few minutes to swap and move.
We have looked at the GPU servers, networking, storage, cooling, and power so far. Still, there is quite a bit more to AI clusters that is often overlooked. Let us get to those next.



Manmade horrors beyond comprehension!
Article good. Video better. I’m not sure I’ve seen a dc video as fast paced as your xAI video since that one. It was like I was watching some gripping mission impossible action movie not some boring dc video. I don’t know how you did that, but keep doing more of it
Eagerly awaiting the day when the AI bubble bursts after investors figure out that AI isn’t a magic black box that replaces human employees. Then some of this hardware can hit the secondary market for prices that hobbyists are willing to pay for hobbyist use of AI.
AI isn’t some investor-fueled bubble. Companies are actively spending massive amounts of money on it. If the companies are collectively spending hundreds of billions of dollars per year on AI just to satisfy perceived investor interest then there is a much bigger problem in the marketplace than an AI bubble. If it is a bubble it is in the hopes and expectations of technology companies, not in the speculation of investors.
@Matt it’s become so bad that the likes of Microsoft have started ‘trimming the fat’ despite record revenues, and are desperately tacking AI (‘copilot’) onto any popular product despite customer pushback that it doesn’t really add any value let alone justify a price increase.
Looking at Cologix’s locations it seems that the most northerly location is Montreal Canada.
Checking the Open Canada website for “Permafrost, Atlas of Canada, 5th Edition” we see that there’s so many better locations available for cooling a server farm (at the lowest possible price). Just as you probably don’t want to setup in southern California (because of the temperature and electricity costs) you wouldn’t setup in southern Canada (when you can move to northern Canada, near a dam).
Not my hundreds of millions ….
Within a year or two, we’ll be looking back and wondering how absolutely *everyone* seemed to think this was a great idea.
I’d love to see this equipment put to work doing scientific research but I fear that it’s already too tightly optimized for AI work.
Regardless, fascinating view into how these clusters come together. Thanks, Patrick!