AMD Ryzen Threadripper PRO 3995WX Performance
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench. We are also going to share some of our workstation benchmarks from our Lenovo ThinkStation P620 review to highlight how the configuration we tested is similar to a traditional dual Intel Xeon Scalable system.
We are going to show off a few results, and highlight a number of interesting data points in this article.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
Plenty of cores and TDP means we see a solid figure here. We also get a nice benefit from the additional memory bandwidth.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
This test relies much less on memory bandwidth, and interestingly we see the Threadripper 3990X show a small gain. Logically, this makes sense given we have very similar compute complexes.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
While the AMD Ryzen Threadripper 2990WX also had the “WX” moniker, we see here a result that shows a bit of why it was not necessarily a clear winner in the workstation market. Perhaps it was an attempt to get OEMs to bite on AMD platforms for their professional workstations that did not work at the time.
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. Here are the comparison results for the legacy data set:
Here we got absolutely great results. There is not much more we can say about this.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
One may notice we have been adding a number of different dual Intel Xeon offerings in these charts. We also have the single-socket Intel Xeon W-3275, Xeon W-2295, and Xeon W-1290P just to show those levels of single-socket performance. To be clear, in this market many will have per-core software licensing and not necessarily want a 64-core CPU. Still, for the segment that is scaling to higher-core counts, this is impressive.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results:
We are trying to phase-out these results, but since this is effectively a 2019-2020 era chip, we wanted to still include them here for comparison purposes.
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
Something that we will note is that in the EPYC 7003 “Milan” generation we get a large speedup here from the bmi2 performance. Since these are launching around the same time as Milan is at hyper-scalers, we double-checked results here to see that it was more in-line with the “Rome” generation.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
Here there are a few key points. First, the higher memory VMs are being impacted more by memory capacity than CPU performance here. This is a workload geared more towards servers where we expect 256GB to be a minimum configuration when we were profiling it. Still, as we move away from the most memory capacity bound side to the more CPU/ memory bandwidth bound side we get a nice speed-up versus the Threadripper 3990X. There is a lot that goes into the performance that includes memory capacity, bandwidth, and CPU performance and it is nice to see all of those showing impact here.
Workstation SPECworkstation 3.0.2
SPECworkstation 3 has been updated to 3.0.2 which measures the 3D graphics performance of systems running under the OpenGL and Direct X application programming interfaces. As a result of the new update, we cannot compare between past version 3 results so we will show the screenshot of the results here and graph them in later reviews.
We just wanted to show the performance of these two workstations. The Dell Precision T7920 William previously reviewed and Dmitrij uses one that we customized for our router/ firewall testing. Here we have Platinum 8260’s which were in our test configuration but as we showed the Intel Xeon Gold 6240R is probably the right Big 2nd Gen Intel Xeon Scalable Refresh SKU.
A Word on Power Consumption
Taking a quick look at power consumption, having a single CPU helps a lot here:
For most of our readers, overall performance, performance per core, or other metrics will likely take precedent over power consumption, but it is still worth noting. The AMD Threadripper 3995WX is not targeting the Xeon W-3275 as much as it is the dual Xeon workstation market.
On that market note, we are going to discuss the market impact followed by our final words.