Dell EMC PowerEdge R750xa Review Low Density Acceleration

1

Dell EMC PowerEdge R750xa Internal Hardware Overview

Inside the system, we start with the GPUs and storage, then have a fan partition, followed by CPUs and memory, and finally get to the rear I/O and power supplies.

Dell EMC PowerEdge R750xa Internal Overview
Dell EMC PowerEdge R750xa Internal Overview

Dell’s configuration puts the I/O plates at the front of the chassis for the coldest possible, airflow. By putting these on risers without actually attaching to a I/O plate at the front of the system, these are mostly going to be used for head-less accelerators like the NVIDIA A100. Some systems attach GPUs to either the front or rear of the chassis so high-end FPGA and other accelerators can use onboard fabric ports. Still, our sense is that NVIDIA GPUs are going to be the top-sellers in this system so most customers will not mind this design.

Dell EMC PowerEdge R750xa NVLINK Bridges
Dell EMC PowerEdge R750xa NVLINK Bridges

One of the really nice features of Dell’s configuration, which is different than some of the PowerEdge C4140 and some competitive systems, is that by stacking the GPUs, Dell can use NVLink bridges between pairs of GPUs.

Not only can Dell support the higher-end NVIDIA A100 PCIe cards, but it also allows pairs of A100’s that are on the same CPU to use NVLink for communication instead of PCIe. The C4140 with SXM GPUs and the PowerEdge XE8545 allow for all GPUs to talk via NVLink bypassing a major bottleneck in CPU to CPU traffic (e.g. UPI/ Infinity Fabric.) The XE8545 also supports more PCIe lanes, more cores, and higher-TDP/ performance NVIDIA A100’s. While both have four A100’s listed in the spec sheet, the XE8545 is the higher-end solution by a long shot.

2x NVIDIA A100 PCIe With NVLink Bridges Installed
2x NVIDIA A100 PCIe With NVLink Bridges Installed

Perhaps the biggest challenge in Dell’s design is that replacing GPUs is not a quick process. While it may look like one simply hits the unlock button and then pops out the GPU, instead Dell has a nine-step process spanning almost the entire chassis printed on the label of each GPU carrier.

Dell EMC PowerEdge R750xa GPU Handle
Dell EMC PowerEdge R750xa GPU Handle

Each GPU is secured to a two GPU riser. There are then power and data connections being fed to each GPU.

Dell EMC PowerEdge R750xa Tough Rear Cable Situation
Dell EMC PowerEdge R750xa Tough Rear Cable Situation

Dell’s design uses a cable access channel in the middle of the fan partition as well as routing cables to the side of the chassis. With no room to service the GPU in place and disconnect cables easily, one is forced to do major work in a 9-step process to liberate the GPU riser, plus any replacement steps, and another 9-step process to put things back together. The sheer number of components and cables made this one of the re-installations most prone to not having proper alignment of any server we have tested from Dell.

Dell EMC PowerEdge R750xa GPU Removal Big Server Disassembly
Dell EMC PowerEdge R750xa GPU Removal Big Server Disassembly

Part of the challenge is that the steps are basically:

  • Disconnect GPU cables from the motherboard side at the rear of the chassis. That also often means the SAS/ NVMe connections as well.
  • Unhook the GPU, SAS/ NVMe connections as needed from the fan shroud
  • Remove the fan shroud from over the CPUs and memory
  • Open the cable access door
  • Remove GPU, SAS/ NVMe connections from the cable access channel
  • Remove the entire fan partition
  • Remove cables from the hooks tieing down cables from the channels that go on either end of the fan partition once removed
  • Remove connections from the front of the motherboard
  • Hit the button and use the latch to remove the riser itself

After all of this is done, there are still more steps to disassemble the riser and get to the GPUs. Also, because the fan shroud has to be removed, connections for the other GPUs and the drives need to be detached. When putting this together, there is a higher chance of collateral damage or just missed re-connections because of this design.

Dell EMC PowerEdge R750xa Front Looking Back Center Cable Channel
Dell EMC PowerEdge R750xa Front Looking Back Center Cable Channel

This is a very strange feeling. Dell went to great lengths to make all of this fit well together. The entire system is even color-coded in black or grey so CPU-GPU connections can be traced easily through the system. At the same time, having tested so many GPU systems from different vendors, there is a huge gap in how easy it is to service the GPU/ accelerator risers in the R750xa versus competitive systems. Dell’s customers may not know any better sitting inside the Dell ecosystem, but with a broader background in these systems, the R750xa’s GPU riser situation is an absolute nightmare to service.

The fans on the other hand are great. Except for the cable access through the middle of the fan partition, everything is very easy to service. The entire partition is removable with a dual lever system, which is great. The fans themselves are dual-fan modules and are probably some of the easiest to swap in the entire industry.

Dell EMC PowerEdge R750xa Hot Swap Dual Fan Module
Dell EMC PowerEdge R750xa Hot Swap Dual Fan Module

This system uses dual Intel Xeon Scalable Ice Lake edition processors and Dell can span to high-end configurations because so much effort was put into cooling. Each processor gets 8 channels of memory and dual-channel configurations for a total of 16 DIMM slots per CPU or 32 per system. One can use Intel Optane PMem 200 DIMMs in this system, but then the system is limited to 30C environments instead of 35C.

Dell EMC PowerEdge R750xa Dual CPU And 32 DIMMs
Dell EMC PowerEdge R750xa Dual CPU And 32 DIMMs

Dell supports up to 270W TDP CPUs like the Intel Xeon Platinum 8380 in this system as well.

Dell EMC PowerEdge R750xa Intel Ice Lake Xeon
Dell EMC PowerEdge R750xa Intel Ice Lake Xeon

As a quick aside, while we mentioned the 9-step process for GPU/ accelerator riser access, there is another important aspect to the design. With the connectivity passing over the CPU and memory airflow guide, one has to remove the cables to get the airflow guide out. That makes servicing a failed DIMM or upgrading memory capacity a much longer task than in other PowerEdge servers or in competitive systems.

Dell EMC PowerEdge R750xa The Challenge With Center Cables 2
Dell EMC PowerEdge R750xa The Challenge With Center Cables 2

The airflow guides are very nice. They are color-coded to the CPUs and those match the PCIe cables associated with each GPU. There are also tie-down hooks to secure the center channel cables. This was actually a great concept. It just leads to longer and higher-risk servicing of the system.

Dell EMC PowerEdge R750xa Centerline With Airflow Guides
Dell EMC PowerEdge R750xa Centerline With Airflow Guides

You may have noticed that the two colors of the shroud are joined at the center. There are tabs that you may think would allow you to service one side without having to remove the other. That is not the case and is also a total bummer that the system is at least not designed to be serviced in hemispheres.

Dell EMC PowerEdge R750xa Split CPU And Memory Baffle
Dell EMC PowerEdge R750xa Split CPU And Memory Baffle

We tried to see if there was a way to do this, but it seems like a lost cause.

Dell EMC PowerEdge R750xa The Challenge With Center Cables 1
Dell EMC PowerEdge R750xa The Challenge With Center Cables 1

This is an interesting block for a few reasons. The blue cables here go in different directions. One goes through the center channel, the other goes around the side of the chassis. The other is that this block one has to be careful to reassemble because the many push tabs are close together. A slight nudge and these can become loose. This is a bigger challenge because of the design where these need to be removed for servicing the memory or GPUs.

Dell EMC PowerEdge R750xa Center Cables
Dell EMC PowerEdge R750xa Center Cables

One interesting note here is that while the two low-profile risers go to different CPUs, they are both low-profile designs. That means that one cannot use higher-end networking like the NVIDIA BlueField-2 100Gbps DPUs that are full-height cards. While there are different options for risers, this is also driven by the configuration of the rest of the system since there are only so many PCIe lanes to go around and so much cooling required.

Dell EMC PowerEdge R750xa OCP NIC 3.0 Slot And X16 Slot
Dell EMC PowerEdge R750xa OCP NIC 3.0 Slot And X16 Slot

Next, we are going to get to MIG, management, and performance before getting to our final words.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.