AMD Ryzen Threadripper 3970X Review 32 Cores of Madness

12

AMD Ryzen Threadripper 3970X Windows Performance

First, we are going to move through our Windows suite before we proceed to our Linux testing. Instead of focusing on lower-end mainstream products, we are going to focus on higher-end parts.

AMD Ryzen Threadripper 3970X AIDA64 Memory Test

AIDA64 memory bandwidth benchmarks (Memory Read, Memory Write, and Memory Copy) measure the maximum achievable memory data transfer bandwidth.

AMD Threadripper 3970X AIDA64 Memory
AMD Threadripper 3970X AIDA64 Memory

The higher frequency helps here, but the Intel Xeon W-3275 performs well with its 6 channel memory.

AMD Ryzen Threadripper 3970X Cinebench R15

Here are our Cinebench R15 results:

AMD Threadripper 3970X Cinebench R15
AMD Threadripper 3970X Cinebench R15

Generally, AMD performs very well here due to microarchitectural reasons. We are going to discuss Cinebench after showing off the newer R20 results.

AMD Ryzen Threadripper 3970X Cinebench R20

We have not run Cinebench R20 in our reviews but will start doing so in future reviews.

AMD Threadripper 3970X Cinebench R20
AMD Threadripper 3970X Cinebench R20

We got about 47% higher performance with the Threadripper 3970X over the 2990WX. That is ultra-impressive as an AMD on AMD generational 32 core improvement.

AMD Ryzen Threadripper 3970X Geekbench 4

Geekbench is a popular Windows benchmark.

AMD Threadripper 3970X Geekbench 4
AMD Threadripper 3970X Geekbench 4

Geekbench tends to struggle with higher core counts, and especially additional NUMA nodes. Here we see what is essentially a 2x improvement over the previous generation Threadripper.

AMD Ryzen Threadripper 3970X 3DMark PCI Express Bandwidth

3DMark feature tests are special tests designed to highlight specific techniques, functions or capabilities. The 3DMark PCI Express feature test is designed to measure the bandwidth available to your GPU over your computer’s PCI Express interface.

The test aims to make bandwidth the limiting factor for performance. It does this by uploading a large amount of vertex and texture data to the GPU for each frame. The result of the test is the average bandwidth achieved during the test.

AMD Threadripper 3970X PCIe Feature Test
AMD Threadripper 3970X PCIe Feature Test

Here we are using PCIe Gen3 for the GPU bandwidth tests since it is a NVIDIA GPU. Still, results are solid versus Intel.

PCIe Gen4 Networking

PCIe Gen4 matters more than just for GPUs. A great example is what we saw with a Mellanox ConnectX5 100GbE PCIe Gen4 adapter when we installed it in Gen3 and Gen4 capable systems.

Mellanox ConnectX 5 100GbE PCIe Gen3 And Gen4
Mellanox ConnectX 5 100GbE PCIe Gen3 And Gen4

We did not get to test the NIC in our normal switched architecture in the main data center lab, so this is from our EPYC 7002 runs. Still, we had the opportunity to do a quick test using direct attach to the workstation and saw +/- 5% of these results on the early runs we did. It was enough to suggest we are seeing similar numbers on the 3rd generation AMD Threadripper parts, but we did not use our full normal test methodology.

In a workstation, this matters a bit less. One is unlikely, at the time of this publication, to have dual 100GbE demands to workstations. Still, it is a good proof point to show why it is impactful as GPUs, networking, and storage can all benefit from the faster interface.

Next, we are going to look at Windows system performance benchmarks.

12 COMMENTS

  1. Really a shame about those RDIMMs. For this reason I’m going to have to get an EPYC at lower clocks for a workstation I’ll be getting next year instead of a TR. It’s a shame, really.

    Totally agree about the platform thing. I’m not switching out CPUs in $6000+ computers.

  2. How were the CPU temps with the noctua-nh-u14s-tr4-sp3? I am surprised that an air cooler could handle this monster!

  3. Any tests that showcase performance for single threaded math heavy operations? I had to dump a previous threadripper built because it hugely lagged behind Intel CPUs mostly due to the absence of AVX2. Since then I have never touched AMD ever again. Am happy to revisit but I would like to see how it performs in single threads that require matrix computations and many millions of mathematical operations per second, ideally vectorized. Any such tests?

  4. @John Lee Could you please make the textual output from lscpu available? I don’t want to be typing all these abbreviations by hand yet I want to see how many different features does it have compared to my trusty TR1920X. Thanks!

  5. By the way, does anyone know what is the situation with encrypted memory main and encrypted memory for virtual machine with this generation of threadripper? The first generation showed support in the cpu flags but was missing something else from BIOS so it didn’t (wasn’t supposed to) work. It’s dick move by AMD to not support them on ThreadRipper, IMO, and I wonder if they kept it.

  6. Thank you for a great review as always. I appreciate the inclusion of SPECworkstation, lots of programs there I use in the HPC world. I need to do some digging on my own to figure out how they build their tests though. Some of those programs are a mess of potential different libraries, MPI,BLAS,LAPACK,FFTW, etc.

    Also I’d love to see some RandomX benchmarks like you did for Epyc. The 3970X should be perfect for it, I expect 25-30kh/s. While I’m asking, a deep dive on the cache would be interesting too, I’ve been seeing some results around online indicating there may be architectural differences in Zen2 Threadripper’s cache access vs Zen2 Ryzen.

  7. Threadripper comes with an ECC caveat that’s if the Motherboard maker chooses to support it and then that ECC support is somewhat lacking compared to AMD’s Epyc branded SKUs. And the single socket Epyc P series of 7002 SKUs are still affordable with the MBs offering up more memory channels(8) and more PCIe lanes with the full vetting/certification for ECC memory types compared to any consumer Zen-2/MB based variants currently.

    There are a few Benchmarks where the 3960X is performing on par or a little better than the 3970X and could that be the result of the 3 out of 4 enabled CPU cores on the 3960X’s CCX units still getting access to the same amount of L3 cache as the 4 enabled cores on the 3970’s CCX units where the 4 enabled cores have effectively less total L3 per CCX core to share among the enabled CPU cores than on the 3960X. I hope there will be more testing of the Cache subsystems on Zen-2 going forward for any SKUs that may have the full complement of L3 cache made available even though there is one, or more, core/cores pre CCX unit disabled and what workloads may benefit from having more total L3 Cache per enabled core on the CCX.

    I’m really interested on seeing any testing done to confirm that for Zen-2 but Zen-3 will see AMD getting rid of the CCX construct altogether and making the CCD die/chiplet have its full Complement of L3 available to the full 8 cores instead of partitioning the CCD into 2 CCX Units. The big question for 8 cores per CCD and no CCX units besides less Infinity Fabric traffic needed to get at that larger shared pool of L3 cache on Zen-3’s CCD die/chiplet is will AMD switch to a Ring Bus configuration on the 8 core CCD or some more complicated topology for 8 cores versus the 4 cores/CCX construct that’s used currently.

    Both AMD and Intel appear to be going wider order superscalar with their respective core designs in order to get more IPC in the face of getting less in performance advantages with the newer smaller process nodes not able to yield as much generational clock frequency increases as in the past. So Zen-3 will have to go wider order superscalar and maybe have some AVX512 options as well. I’d love to see AMD Bring some L4 cache to the I/O die at some point in time for any workloads that really can benefit but that’s maybe something that will have to wait for Zen-4 with hopefully Zen-3 getting some larger shared per CCD Die/Chiplet L3 cache over what Zen-2 offers.

    Really the Epyc/SP3 motherboard warranty/support periods are much longer than any Consumer/Threadripper offerings and that has to factor in to TCO for any professional end users that can really also deduct Epyc’s higher up front costs as a business expense. And really as far as ECC CPU/MB partner support goes Epyc CPU/MBs are vetted/certified on all the professional software packages whereas Threadripper CPUs/MBs will have less testing/certification guarantees and less product support should that be needed from AMD and the SP3 Motherboard makers .

    Threadripper may be sufficient for some if they absolutely need the higher clocks and are not dependent on ECC for certain workloads and maybe that’s good enough for some but folks need to do some more in depth cost/benefit analysis that also factors in the CPU’s cost/per memory channel and cost/per PCIe lane as well as the MB’s cost/memory channel and cost/PCIe lane. And that can make Epyc/SP3 the better deal on a cost/feature basis.

  8. @Fabian,what has this to do with dirty tricks? Fact is that my math/linear algebra heavy programs on Intel CPUs ran circles around both the previous gen Threadripper and Epyc CPUs at otherwise identical frequencies and memory speeds. I could not care less what “games” anyone is playing when my back tests and other heavy math procedures finish in half the time on one CPU vs the other. I have been a very heavy amd critic for math heavy applications and voice such on this website multiple times. Am always happy to revisit to test new amd products but so far neither Epyc nor Threadripper came even close in performance to Intel’s cpu for math heavy applications.

  9. @matt what fabian pointed to is that if you simply force matlab to properly recognize the math abilities of the AMD CPU it will run many more circles around the intel chips… the amd cups are faster on anything except a few avx512 special cases, so if you dont see that good chance it’s your math library that is heavily under utilizing the AMD chip. Nothing to criticize amd for, they cant fix your code for you.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.