About a month ago we published the first third-party review of the Cavium ThunderX2 piece. For those that missed our coverage, the Cavium ThunderX2 is a generally available, 64-bit Arm server CPU that is competitive with Intel Xeon Skylake-SP and AMD EPYC in terms of features and performance. You can read our review Cavium ThunderX2 Review and Benchmarks a Real Arm Server Option where we go in-depth into the performance of the chips as well as the bigger ecosystem and usability aspects of going Arm. Many of our readers noted that we punted a bit on our power consumption figures. We are now ready to share more about what was going on, and Cavium ThunderX2 power consumption data.
Gigabyte R281-T90 “Sabre” Test Platform Mysteries
In our initial review, we noted that the Gigabyte Sabre test platform was giving us power numbers that seemed out of place:
Our Gigabyte/ Cavium ThunderX2 Sabre development platform hit a peak of 823W at 100% load. We think that there are likely optimizations that can occur at the system’s firmware level, and by using GA power binned chips. At first, we thought that these numbers were way out of line so we discussed them with Cavium and that is when we were told that the ~800W range was correct for our system and pre-production chips. The company also told us that the production systems will have firmware that is better power optimized. As a result, we are not going to publish a direct comparison until we can get a mature Cavium ThunderX2 platform with production chips and system firmware. This may take some time, but publishing a comparison using the Sabre platform and the unbinned silicon is disingenuous.
We knew something was “still in development” because we could hear and feel the server in our racks. The rack the test platform was installed in had a blade chassis with 12kW of PSUs installed and DeepLearning11 a 10x GPU and dual Intel Xeon deep learning server capable of sustaining over 4kW of power consumption. The ThunderX2 platform is just above the Gigabyte R281-G30 Versatile Compute Platform we reviewed based on a similar layout and chassis design. All three of those systems were pushing significantly less air per U of rack space than the development box.
It turns out that our airflow perception was correct. The fans were spinning, consistently, at well over 15,000 rpm. We knew from the Intel Xeon Skylake-SP based Gigabyte server racked just above that the fan speeds were significantly higher than our expectations. Something was not right.
We brought our findings to Cavium and were told that the Sabre platform we are using did not have a production firmware. The focus of the Sabre platform was to get a functional platform using the new architecture. That makes sense, but it meant that normal power optimization steps had not taken place. We finally received an updated firmware and a higher firmware revision and got set to testing.
After a full power off, we fired up the system to get some power figures.