One of the more interesting announcements this week is the Supermicro SYS-6049GP-TRT server. This new server supports up to 20x NVIDIA Tesla T4 inferencing GPUs. We have a bit of analysis since the company made the announcement without the system specs on its website. This announcement is especially interesting since 20x NVIDIA Tesla T4 GPUs would need a total of 320x PCIe 3.0 lanes to run at full speed. Intel Xeon Scalable dual-socket platforms have a maximum of 96 lanes in dual socket configurations, plus a nominal amount from the chipset. AMD EPYC has up to 128 PCIe lanes (minus a few for system peripherals.) There is also the question of physically fitting the GPUs. We think we know what Supermicro is doing to make this work.
Over the past few days, we have received quite a few e-mails asking about the announcement and photo that we wanted to cover a bit more in-depth.
Supermicro shared a single banner image of the SYS-6049GP-TRT. This single banner image only has 16x NVIDIA GPUs installed.
We covered the NVIDIA Tesla T4 inferencing GPU at its launch. One reason that the new GPUs generated so much buzz is that the new NVIDIA Tesla T4 is a low profile card which greatly expands its ability to fit into general purpose server form factors.
One will note that the Tesla GPUs pictured in Supermicro’s banner image are full height Volta generation Tesla design cards, although there are only a handful of people in the world that would catch that. To STH readers, this should be really interesting. NVIDIA makes a “baby” Tesla V100 single slot 150W card. It is possible that the design Supermicro showed off is a 16x NVIDIA Tesla V100 (150W PCIe) 4U server.
The basic chassis building block that Supermicro used we saw in DeepLearning10 and DeepLearning11 along with our Supermicro 4028GR-TR 4U 8-Way GPU SuperServer Review.
The chassis has 21 full height PCIe expansion cutouts in the back of the chassis as can be seen above.
We had the daughterboard diagrams in the STH CMS from our How Intel Xeon Changes Impacted Single Root Deep Learning Servers article. This is now what the Supermicro SYS-6049GP-TRT is using, but it gives a PCB layout to follow:
We think that the Supermicro SYS-6049GP-TRT is using a PCB with additional PCIe switches driving additional PCIe expansion slots. Instead of leaving ten dual width PCIe spacing slots, Supermicro has another PCB with perhaps 8 or 9 additional PCIe slots. There is either a large PCIe switch complex or each GPU is getting fewer PCIe lanes.
It appears as though fitting 20x single-width cards is possible in Supermicro’s standard chassis design. With 8kW of available power using four 2kW power supplies, the Supermicro SYS-6049GP-TRT can certainly handle that number of NVIDIA Tesla T4 GPUs for inferencing. Perhaps more intriguing is the possibility that the system pictured is actually of 16x NVIDIA Tesla V100 single-width 150W GPUs. Those would have a 2.4kW GPU draw, just like 8x 300W double-width GPUs.
With one small image, the Supermicro SYS-6049GP-TRT became one of the most intriguing AI inferencing servers available in the NVIDIA Tesla ecosystem assuming it supports both NVIDIA Tesla T4 and NVIDIA Tesla V100 (150W) GPUs.