Building DeepLearning12 in the Gigabyte G481-S80
The Gigabyte G481-S80 is a 4U server, but it consumes a large amount of hardware. We had 2x Intel Xeon Gold 6138 CPUs, 12x 16GB (later 24x 16GB) DDR4-2666 RAM modules, multiple SSDs (NVMe and SATA), 8x NVIDIA Tesla P100 SXM2 GPUs, and ten massive heatsinks. For networking, we used four 100GbE/ EDR Infiniband Mellanox ConnectX4 NICs, a 40GbE ConnectX-3 Pro NIC, and a dual 25GbE OCP mezzanine card.
The CPU and RAM area was extremely easy to work on. Once the top cover was off, it felt like almost any other Intel Xeon Scalable CPU and memory installation. There are a total of 12x DDR4 DIMM slots per CPU for 24x total. We used 192GB and 384GB configurations, but that is on the lower end of what you will see in a Gigabyte G481-S80.
The main area of the Gigabyte G481-S80 is where the NVIDIA Tesla SXM2 modules go, but there is more. There are slots for four 100Gbps networking cards, one for every two GPUs that sit on the same PCIe root for GPUDirect RDMA. Here, just use Mellanox ConnectX-4 or ConnectX-5 and do not bother with any other configuration.
We have an installation video that you can see here. We installed the SXM2 modules ourselves. After having done it, we highly recommend having these installed at the factory. We spoke to NVIDIA about it and several vendors and the general consensus is that this is a difficult installation with a high-risk for damaging expensive GPUs. This is an area where PCIe installation is much easier.
Once the GPUs. are installed, one is left with what we call a SXM2 “heatsink forest.” Front and rear GPUs have different size heatsinks but the array is absolutely menacing. Then again, these heat pipe coolers need to dissipate heat from 300W TDP NVIDIA Tesla SXM2 modules. The DGX-1 class SXM2 modules are up to 300W. The DGX-2 GPUs can hit 350W we are told. That is certainly a difference between the two. Compared to a PCIe cooling solution, you can see how this is much more efficient. You can also see why Gigabyte put water cooling hose holes in the rear of the chassis as these are systems that one can easily see getting water cooling treatment.
For some perspective, the area depicted is pre-heated by the two Intel Xeon Scalable CPUs, RAM, and storage. It then has to handle cooling around 2500W worth of components including the 8x SXM2 GPUs, PCIe switches, and 100Gbps networking.
We did not get a great photo of this, but in the rear of the chassis, there is a curious pull-out option next to the power supplies. Inside this pop-out area is a Gigabyte CLBGM10 module. It is a fairly large PCB that ultimately allows for the addition of an OCP networking module. In this case, we have a dual 25GbE Broadcom OCP mezzanine card installed.
We have a video talking about the hardware in detail that is worth checking out.
This is, by far, the most difficult server I have ever built. 2U 4-node servers, blade servers, and even 8x and 10x GPU PCIe servers are much easier. The design of the Gigabyte G481-S80 facilitated installation, it simply took a long time. We strongly recommend having someone else build your SXM2 GPU server.
In terms of racking and stacking, my current deadlift is around 400lbs/ 181kg and I was able to rack the system hip height without too much trouble. In the lab, we keep lower U’s for large and heavy chassis such as this one. Anything higher than hip height and I would have had to use our server lift. The rail system for the G418-S80 is “L” shaped shelves which makes installation very easy. These boxes can weigh 100-200lbs. If you want to deploy racks of these, you may also want to ensure that you have the power, cooling, and floor support capacity to handle such a deployment.
Next, we are going to look at the Gigabyte G481-S80 topology. In the deep learning/ AI space, system topology is a big deal. There is a reason you want this type of architecture over a PCIe architecture, and we will show why.