AMD Ryzen Threadripper 3970X Review 32 Cores of Madness

12

AMD Ryzen Threadripper 3970X Linux Benchmarks

For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.

We are going to show off a few results, and highlight a number of interesting data points in this article.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:

AMD Threadripper 3970X Linux Kernel Compile Benchmark
AMD Threadripper 3970X Linux Kernel Compile Benchmark

If you are a software developer that is constantly doing local compile work, this chart should say a lot. Not only is the AMD Ryzen Threadripper 3970X almost twice as fast as the previous generation 2990WX, but it is getting close to being 3x the speed of the 16-core Threadripper 1950X. If you have a system that has been running for the last two years, there may be massive performance improvements from a new workstation. Given the performance gains, this is one area where one can make the business case that the cost of a new system will see a positive ROI within even a 30-day window. That is spectacular.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.

AMD Threadripper 3970X C Ray 8K Benchmark
AMD Threadripper 3970X C Ray 8K Benchmark

Like Cinebench on the Windows side, our c-ray results tend to skew toward the AMD architecture. We added the AMD EPYC 7302P here just to give some guidance as to how these compare to our AMD EPYC 7002 figures. The additional clock speed and TDP yield more than a 2x performance advantage over the EPYC part. Translating that, it means the Threadripper cores are running at higher clocks doing more work per core.

7-zip Compression Performance

7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

AMD Threadripper 3970X 7zip Compression Benchmark
AMD Threadripper 3970X 7zip Compression Benchmark

If you are still running a dual Intel Xeon E5 V4 workstation, this is an opportunity to consolidate to a single CPU solution. You can see here performance that is almost 3x what a dual Intel Xeon E5-2630 V4 (10 cores each) system can do on this test.

NAMD Performance

NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. We are going to augment this with GROMACS in the next-generation Linux-Bench in the near future. With GROMACS we have been working hard to support Intel’s Skylake AVX-512 and AVX2 supporting AMD Zen architecture. Here are the comparison results for the legacy data set:

AMD Threadripper 3970X NAMD Benchmark
AMD Threadripper 3970X NAMD Benchmark

This we found fascinating. While using AVX2 and AVX-512 can change this picture when we are running code using lower levels of optimization for both architectures the AMD solution performs very well. Here, dual Intel Xeon Gold 6242 CPUs are no match for the AMD Ryzen Threadripper 3970X. The HEDT segment is cannibalizing the dual Xeon workstation market and this is a great example. Those are frequency optimized Xeons designed for per-core licensing. Still 32 cores v. 32 cores the AMD solution is performing better.

Sysbench CPU test

Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.

AMD Threadripper 3970X Sysbench CPU Multi Thread Benchmark
AMD Threadripper 3970X Sysbench CPU Multi Thread Benchmark

Again, we are seeing great performance in sysbench from the Threadripper 3970X.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

AMD Threadripper 3970X OpenSSL Sign Benchmark
AMD Threadripper 3970X OpenSSL Sign Benchmark

Here are the verify results:

AMD Threadripper 3970X OpenSSL Verify Benchmark
AMD Threadripper 3970X OpenSSL Verify Benchmark

OpenSSL is considered a foundational web technology. Having a lot of cores means that the AMD Ryzen Threadripper 3970X performs well here. Performance is higher per core here than with the Xeon W-3275. We also wanted to note here that this is a great example of where the earlier dual Xeon E5 V1/ V2 workstations can be consolidated into a single socket. The dual Intel Xeon E5-2670 V1 solution has half as many cores as the single Threadripper 3970X but is only performing at about one quarter the speed.

UnixBench Dhrystone 2 and Whetstone Benchmarks

Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:

AMD Threadripper 3970X UnixBench Dhrystone 2 Benchmark
AMD Threadripper 3970X UnixBench Dhrystone 2 Benchmark

Here are the whetstone results:

AMD Threadripper 3970X UnixBench Whetstone Benchmark
AMD Threadripper 3970X UnixBench Whetstone Benchmark

We added some varied results to this chart. Perhaps the most notable here is that the AMD Ryzen Threadripper 3970X actually pulled ahead of the dual ThunderX2 32 core part. Similarly, even the top bin from early 2017 dual Intel Xeon E5-2699 V4 is behind the new Threadripper part. On the floating point side, the Intel Xeon W-3275 is able to claim a victory.

Chess Benchmarking

Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:

AMD Threadripper 3970X Chess Benchmark
AMD Threadripper 3970X Chess Benchmark

At this point, one can probably surmise the performance figures by saying with 32 cores the AMD Ryzen Threadripper 3970X performs at or better than previous generations and compared to Intel competition. Still, the Intel Core i9-10980XE is generally maintaining a level of performance well beyond 50% of the AMD Ryzen Threadripper 3970X. This is despite the Core i9-10980XE having a list price less than half of the 3970X. When looking at a system, if one does not need to scale this high, the Core i9 may actually be a great option.

The performance summary is, of course, that the Threadripper 3970X is an excellent performer on both Windows and Linux.

Next, we are going to look at the power consumption before getting to our final thoughts.

12 COMMENTS

  1. Really a shame about those RDIMMs. For this reason I’m going to have to get an EPYC at lower clocks for a workstation I’ll be getting next year instead of a TR. It’s a shame, really.

    Totally agree about the platform thing. I’m not switching out CPUs in $6000+ computers.

  2. How were the CPU temps with the noctua-nh-u14s-tr4-sp3? I am surprised that an air cooler could handle this monster!

  3. Any tests that showcase performance for single threaded math heavy operations? I had to dump a previous threadripper built because it hugely lagged behind Intel CPUs mostly due to the absence of AVX2. Since then I have never touched AMD ever again. Am happy to revisit but I would like to see how it performs in single threads that require matrix computations and many millions of mathematical operations per second, ideally vectorized. Any such tests?

  4. @John Lee Could you please make the textual output from lscpu available? I don’t want to be typing all these abbreviations by hand yet I want to see how many different features does it have compared to my trusty TR1920X. Thanks!

  5. By the way, does anyone know what is the situation with encrypted memory main and encrypted memory for virtual machine with this generation of threadripper? The first generation showed support in the cpu flags but was missing something else from BIOS so it didn’t (wasn’t supposed to) work. It’s dick move by AMD to not support them on ThreadRipper, IMO, and I wonder if they kept it.

  6. Thank you for a great review as always. I appreciate the inclusion of SPECworkstation, lots of programs there I use in the HPC world. I need to do some digging on my own to figure out how they build their tests though. Some of those programs are a mess of potential different libraries, MPI,BLAS,LAPACK,FFTW, etc.

    Also I’d love to see some RandomX benchmarks like you did for Epyc. The 3970X should be perfect for it, I expect 25-30kh/s. While I’m asking, a deep dive on the cache would be interesting too, I’ve been seeing some results around online indicating there may be architectural differences in Zen2 Threadripper’s cache access vs Zen2 Ryzen.

  7. Threadripper comes with an ECC caveat that’s if the Motherboard maker chooses to support it and then that ECC support is somewhat lacking compared to AMD’s Epyc branded SKUs. And the single socket Epyc P series of 7002 SKUs are still affordable with the MBs offering up more memory channels(8) and more PCIe lanes with the full vetting/certification for ECC memory types compared to any consumer Zen-2/MB based variants currently.

    There are a few Benchmarks where the 3960X is performing on par or a little better than the 3970X and could that be the result of the 3 out of 4 enabled CPU cores on the 3960X’s CCX units still getting access to the same amount of L3 cache as the 4 enabled cores on the 3970’s CCX units where the 4 enabled cores have effectively less total L3 per CCX core to share among the enabled CPU cores than on the 3960X. I hope there will be more testing of the Cache subsystems on Zen-2 going forward for any SKUs that may have the full complement of L3 cache made available even though there is one, or more, core/cores pre CCX unit disabled and what workloads may benefit from having more total L3 Cache per enabled core on the CCX.

    I’m really interested on seeing any testing done to confirm that for Zen-2 but Zen-3 will see AMD getting rid of the CCX construct altogether and making the CCD die/chiplet have its full Complement of L3 available to the full 8 cores instead of partitioning the CCD into 2 CCX Units. The big question for 8 cores per CCD and no CCX units besides less Infinity Fabric traffic needed to get at that larger shared pool of L3 cache on Zen-3’s CCD die/chiplet is will AMD switch to a Ring Bus configuration on the 8 core CCD or some more complicated topology for 8 cores versus the 4 cores/CCX construct that’s used currently.

    Both AMD and Intel appear to be going wider order superscalar with their respective core designs in order to get more IPC in the face of getting less in performance advantages with the newer smaller process nodes not able to yield as much generational clock frequency increases as in the past. So Zen-3 will have to go wider order superscalar and maybe have some AVX512 options as well. I’d love to see AMD Bring some L4 cache to the I/O die at some point in time for any workloads that really can benefit but that’s maybe something that will have to wait for Zen-4 with hopefully Zen-3 getting some larger shared per CCD Die/Chiplet L3 cache over what Zen-2 offers.

    Really the Epyc/SP3 motherboard warranty/support periods are much longer than any Consumer/Threadripper offerings and that has to factor in to TCO for any professional end users that can really also deduct Epyc’s higher up front costs as a business expense. And really as far as ECC CPU/MB partner support goes Epyc CPU/MBs are vetted/certified on all the professional software packages whereas Threadripper CPUs/MBs will have less testing/certification guarantees and less product support should that be needed from AMD and the SP3 Motherboard makers .

    Threadripper may be sufficient for some if they absolutely need the higher clocks and are not dependent on ECC for certain workloads and maybe that’s good enough for some but folks need to do some more in depth cost/benefit analysis that also factors in the CPU’s cost/per memory channel and cost/per PCIe lane as well as the MB’s cost/memory channel and cost/PCIe lane. And that can make Epyc/SP3 the better deal on a cost/feature basis.

  8. @Fabian,what has this to do with dirty tricks? Fact is that my math/linear algebra heavy programs on Intel CPUs ran circles around both the previous gen Threadripper and Epyc CPUs at otherwise identical frequencies and memory speeds. I could not care less what “games” anyone is playing when my back tests and other heavy math procedures finish in half the time on one CPU vs the other. I have been a very heavy amd critic for math heavy applications and voice such on this website multiple times. Am always happy to revisit to test new amd products but so far neither Epyc nor Threadripper came even close in performance to Intel’s cpu for math heavy applications.

  9. @matt what fabian pointed to is that if you simply force matlab to properly recognize the math abilities of the AMD CPU it will run many more circles around the intel chips… the amd cups are faster on anything except a few avx512 special cases, so if you dont see that good chance it’s your math library that is heavily under utilizing the AMD chip. Nothing to criticize amd for, they cant fix your code for you.

LEAVE A REPLY

Please enter your comment!
Please enter your name here