Gigabyte G481-S80 DeepLearning12 Topology
We first wanted to start with a bit about a DGX-1/ DGX-1.5 class server topology. There is a difference between Pascal (Tesla P100) and Volta (Tesla V100) generations. The newer NVIDIA Tesla Volta V100 generations have 6x 50GB/s NVlinks while the older Tesla P100 generation that we are using has 4x 40GB/s links. That means that the speed/ latency of the Tesla V100 links will be better, but it also means that each GPU in the V100 configuration has more bandwidth to other GPUs due to the additional links.
In these architectures, there are two PCIe 3.0 x16 links from each CPU to PCIe switches. In this case, Broadcom (PLX) PCIe switches. Each switch is connected to two GPUs and one PCIe 3.0 x16 networking slot. The GPUs are connected to each other via NVLink but are also connected to PCIe through this switch complex. More importantly, for GPUDirect RDMA, each pair of GPUs that shares a PCIe switch is also connected in the same PCIe root complex to a Mellanox Infiniband card.
We had a ton of NICs installed in the system, but we wanted to show what the topology looks like:
For comparison, here is our DeepLearning10 8x PCIe GPU (NVIDIA GTX 1080 TI) topology. You will notice that this topology traverses the PCIe and sometimes the QPI/ UPI links more often. You will also notice the absence of the Mellanox cards in that solution as they were not showing up at the time with GTX 1080 Ti’s.
Taking a step up the stack to the system itself, you can see the motherboard topology here. The front panel connectivity is handled mostly through the PCH and CPU0.
Here is the lstopo view of the system topology that we took during testing.
These days, if you want faster GPU to GPU communication this type of solution is what you want if you cannot get into a DGX-2 / HGX-2 class 16x GPU 10kW system. We covered How Intel Xeon Changes Impacted Single Root Deep Learning Servers which make them less desirable in the Intel Xeon Scalable generation.
Next, we are going to start looking at performance before getting into power consumption, cost of ownership, and then our final words.