This is going to be the first part of a mini-series we are doing on Supermicro liquid cooling. A few weeks ago we visited Supermicro’s headquarters in San Jose, California, and got to see the company’s new liquid cooling solutions first-hand. We recorded a video but thought that it is worth doing a few pieces on what we saw. The first in this series is going to look at Supermicro’s custom liquid cooling distribution. Supermicro is going a step further than many of its competitors and is building its solution all the way to the CDU and packaging the entire solution together.
Supermicro Custom Liquid Cooling Rack – A Look at the Cooling Distribution
First, here is the video where we show all of this including integrations with Supermicro’s high-end NVIDIA H100 AI server and various BigTwin servers.
As always, we suggest watching this in its own browser, tab, or app for the best viewing experience. Also, we are going to call this sponsored since Supermicro paid for George and I to fly out on short notice when a NVIDIA H100 system was available only for a few days so that we could look at everything.
First, the heart of the new cooling system is Supermicro’s CDU. Years ago we saw a 40kW capacity unit, but that is quickly becoming too little for a rack these days, so we now see 80kW and 100kW solutions coming.
The 4U CDU completes a number of functions. The primary purpose is to pump fluid in loops and exchange heat between an in-rack liquid cooling loop and a facility water (or other heat exchanger) loop.
We show removing and installing the pumps which can be “hot swapped” on this CDU so that it can keep the rack in operation even if a pump fails.
Supermicro’s system has a little LCD screen showing management information. The CDU is designed to have its sensor data and management align with the rest of the server infrastructure.
We opened the controller area door and saw a Raspberry Pi controlling this screen. Raspberry Pi’s we have seen in a number of CDUs in the industry.
Here is a quick look inside the sensor and control compartment.
At the rear, we can see the sensor area to the left. One hooks up sensors to detect different levels and also leak sensors. The system is not supposed to leak, however, folks want to know if one occurs so the solution has leak sensors. There are redundant power supplies as well to power the CDU.
We can also see the large liquid cooling pipes that can exchange fluid with rack manifolds that we will look at next.
Supermicro Liquid Cooling Rack Manifolds
In liquid cooling, the rack manifolds are used to distribute the cooler liquid from the CDU to the various liquid cooling blocks and then collect the liquid from the systems and bring it back to the CDU for heat extraction. Here is an example of a GPU rack designed to house 4x NVIDIA H100 8-GPU servers. One can see the six APC PDUs installed in the rack just to provide power to the four systems, PDU, and switches.
Since there are fewer systems, the zero U manifolds have larger nozzles that feed the horizontal PSUs. Blue is for cool.
Red is for the warmed returned liquid.
One cool feature Supermicro has is its horizontal manifolds used in deployments for things like its AI servers.
These horizontal manifolds minimize the number of runs through the rack. They also make servicing the GPU systems very easy.
All of the fittings Supermicro is using are leakless quick disconnects. That is standard in the industry. We were told that they were designed to insert and remove with one hand, and that was certainly the case.
Of course, another great use of liquid cooling is for dense CPU compute. For example, we recently did a piece on the Intel Xeon MAX 9480 with 64GB of HBM2e onboard. 2U 4-node platforms like the BigTwin became popular for their density when CPUs used 65-150W. Now, at 350W each and climbing soon to 500W, liquid cooling is going to be required on more installations.
For that, Supermicro has another zero U manifold with a denser set of smaller nozzles.
Each nozzle is designed to service a node with a pair of CPUs. In a 2U 4-node cluster, that means four CPUs per U or two cool blue and two warm red nozzles per 1U.
It was fun to see both options at Supermicro.
If you watch the video, you will see we have quite a bit more coming from this trip. Still, we wanted to show off the Supermicro liquid cooling distribution side first. Since the company is doing extremely well on the AI server front, it has customers who want liquid cooling. Instead of calling up another company to add liquid cooling, Supermicro is designing the racks, with liquid cooling. That is a big difference in strategy.
Stay tuned for more on the liquid cooling front. This is going to be a bigger topic as we move into the 2024 era of servers. As a result, we have a number of liquid cooling pieces in the pipeline since it is time to start having the discussion on transitioning to liquid cooling.