Using Accelerators for Big Gains
One of the questions we got asked after the piece a few weeks ago on using accelerators for AI, crypto, and HPC workloads was about the power consumption delta. Since we did that last piece in AWS, we did not have access to that data. So to get that data, we simply used the Inspur server running the same workloads to see what we would get. With that, we saw some fairly huge differences.
A key note on this is the methodology. One of the big challenges with doing accelerator v. no-accelerator is simply getting a useful comparison. We tend to run shorter workloads that take minutes or hours, but real-world systems run 24×7. Further, if workload A using acceleration Aa and no-acceleration An perform at different rates, and over different time periods, then it is harder to compare power consumption between Aa and An. What we did was simply to average power consumption over the main run. We tied the data to the server sensor data plus also were monitoring with the trusty Extech 380803 TrueRMS power meters between the systems and PDUs, as well as the PDU. That gave us PDU data along with system PSU data that we could then tie into time intervals for the workloads. Each of these workloads has periods where they are running at a sustained performance level, and that is what we are using here. We can then get an average power consumption during the sustained portion and the average performance, and get some nice power/ performance ratios.
NAMD STMV AVX2 v AVX-512
Here we are using our STMV workload, and specifically, we are going to use the non-Tiled AVX-512 result versus just using AVX2 to keep this as apples-to-apples as we can get. Of course, as we saw in the previous piece, the AVX-512 Tiled results are much better, but we wanted to show this as more of low-hanging fruit.
What was most interesting about this one was that the power consumption was close. The AVX2 results actually had lower power consumption than the AVX-512 results. Something to remember here is that especially at these power levels, fans are spinning at high speeds, and thus cooling is a much higher percentage of system power as we saw in our 1U v. 2U results.
Still, when it comes to the amount of work for the power used, the AVX-512 version was clearly better, and by a significant amount. In a world where getting 1-2% on power supplies or fan differences matters. Getting something this large is borderline astounding. This is also why many of the higher-end systems are looking at transitioning to liquid cooling because fans and air cooling are using so much of the total system power here.
Tensorflow with DL Boost
The second workload we looked at was Tensorflow and again we are using the DL Boost acceleration in the Ice Lake systems.
Here again, we saw relatively similar power consumption, but we also saw a huge delta in terms of performance. The net impact was a huge efficiency delta. Most of this is driven by being able to do Int8 instead of FP32. Still, this is an example where the built-in accelerator really drives a huge efficiency gain.
Let us take a second and talk accelerators here. The reason the fan discussion is very important is that a lot of the time folks talk about accelerators in terms of package power consumption. We do it as well. When we look at the NVIDIA T4 that will be part of the NF5180M6’s review configuration, nvidia-smi will tell us the power consumption of that passively cooled GPU. What it is not doing is taking into account the power consumption of the chassis fans that have to cool the GPU. As we have seen, when we are using 15% of incremental system power for cooling, the actual impact of using PCIe accelerators is higher than many discuss. The challenge, of course, is that while major vendors all have the ability to adjust fan speeds to cool passively cooled PCIe accelerators, the algorithms are different and things like how many drives are populated start to matter. Still, when an accelerator claims board power consumption is the total power consumption impact, as a STH reader, you will now know that it is missing a large chunk of system power impact.
WordPress NGINX PHP-FPM
In terms of WordPress, we saw a bit of a different outcome because we actually had lower power in our setup and higher performance.
This was a less exciting one than we expected. In terms of power consumption, overall power consumption seemed much lower, but that was an impact of the averaging over the run. As we saw previously, this is not the biggest performance gain because the acceleration is only on a portion of the workload. At the same time, this is good.
As we have gone through this exercise, a few insights stood out.
- Getting the power situation sorted, with high-quality power supplies, even sticking to AC power and normal 200-240V power can yield 3-6% lower power consumption. This is an important one to get right.
- As systems get to higher power levels, we will see 15%+ of server power go to cooling. Using 2U chassis instead of 1U chassis means we can get some of that power back. Ultimately, this will push dense systems to liquid cooling especially as that hits 20% levels on higher-end systems.
- The impact of using PCIe accelerators is more than just package power. Incremental cooling costs become a significant portion of the power budget, especially in higher-end systems. In lower-power systems where cooling is a lower overall percentage of power consumption the impact of adding a dedicated accelerator is less.
- Ultimately, having accelerators in CPUs is about more than just performance gains. The power consumption benefits of using the accelerators can be enormous. We used the Intel Ice Lake generation CPUs here because they are basically the first generally available CPUs with this level of acceleration. Other vendors will add their own accelerators over time. We also know that the next-generation Sapphire Rapids Xeons will have more built-in acceleration for AI and other workloads because the performance and efficiency gains are so high.
What we presented here is likely going to be different than what you will see with other server vendors, CPU SKUs, configurations, and so forth. Still, the basic principles should apply. According to Digital Realty, one of the largest data center operators in the world, about 3% of global electricity output goes to power data centers. Using technology that exists today to get 10%+ better power efficiency not only saves money but also has a direct impact on the environmental burden of the world’s compute infrastructure.
Everyone has limitations when it comes to facilities, budgets, and so forth, but this is an area where we are going to urge our readers to take notice of and think about the impacts of seemingly small choices in power efficiency.