Inspur Systems NF5468M5 Review 4U 8x GPU Server

9

Inspur NF5468M5 Storage Performance

We tested a few different NVMe storage configurations because this is one of the Inspur Systems NF5468M5 key differentiation points. Previous generation servers often utilized a single NVMe storage device if any at all. There are eight SAS3 / SATA bays available but we are assuming those are being used for OS/ bulk storage given the system’s design. Instead, we are testing the four NVMe drives that will likely be used for high-performance storage.

4x P4600 P3700 P3520 P3320 Aggregate NVMe Throughput
4x P4600 P3700 P3520 P3320 Aggregate NVMe Throughput

Here we see the impressive performance. With the Intel Xeon E5-2600 V4 generation of PCIe-based deep learning training servers, one was typically limited to a single PCIe slot for NVMe storage. That meant a single device. With the Intel Xeon E5 generation that single NVMe device was generally 1.6TB to 2TB in size.

We are using four 3.2TB devices here which are not even the highest capacity U.2 NVMe SSDs on the market. Even with that, we have 12.8TB of NVMe storage or over six times what we saw from high-end previous generation systems. That means one can store more data locally. What is more, one has more local NVMe bandwidth feeding the GPUs which can lead to higher efficiency on many scenarios.

Inspur NF5468M5 Networking Performance

We loaded the Inspur Systems NF5468M5 with a number of NICs. For the main networking NICs, we used Mellanox ConnectX-5 100GbE/ EDR Infiniband NICs but also utilized the dual Mellanox ConnectX-4 Lx 25GbE NICs.

Inspur NF5468M5 Networking Performance
Inspur NF5468M5 Networking Performance

Networking is an important aspect as CSPs are commonly deploying 25GbE infrastructure and many deep learning clusters are using EDR Infiniband as their fabric of choice or 100GbE for moving data from the network to GPUs.

Dual Mellanox ConnectX 5 In Inspur NF5468M5
Dual Mellanox ConnectX 5 In Inspur NF5468M5

Inspur Systems NF5468M5 GPU Burn

One of the areas we wanted to test with this solution was the ability to effectively cool eight NVIDIA Tesla V100 32GB GPUs. We ran GPU burn and took samples

Inspur Systems NF5468M5 GPU Burn Temperature Over Time
Inspur Systems NF5468M5 GPU Burn Temperature Over Time

Overall, the system cooling kept GPUs in acceptable ranges. This performance takes more than just fans. Consistent performance over time requires aspects such as cabling and airflow over NICs to be designed to ensure proper airflow. The Inspur Systems NF5468M5 scores well with GPU Burn which is a worst-case scenario.

Next, we are going to take a look at the Inspur Systems NF5468M5 power consumption before looking at the STH Server Spider for the system and concluding with our final words.

9 COMMENTS

  1. Ya’ll are doing some amazing reviews. Let us know when the server is translated on par with Dell.

  2. Amazing. For us to consider Inspur in Europe English translation needs to be perfect since we have people from 11 different first languages in IT. Our corporate standard since we are international is English. Since English isn’t my first language I know why so early of that looks a little off. They need to hire you or someone to do that final read and editing and we would be able to consider them.

    The system looks great. Do more of these reviews

  3. Can we see a review of the Asus ESC8000 as well? I have not found any other gpu compute designer that offers the choice in bios between single and dual root such as Asus does.

  4. How is the performance affected by CVE‑2019‑5665 through CVE‑2019‑5671and CVE‑2018‑6260?

  5. P2P bandwidth testing result is incorrect, above result should be from NVLINK P100 GPU server not PCIE V100.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.