If you are buying into the high-end desktop (HEDT) segment over the next few months, the AMD Ryzen Threadripper 3970X is the pinnacle, albeit at $1999. The CPU itself costs more than most modern corporate IT notebooks. It is a tool designed not for the masses, instead, it is designed for a well-defined market segment that needs enormous desktop compute resources. In our review, we are going to discuss what has changed compared to the previous version and how this chip fits in the ecosystem of hardware that is available in the market. We are getting more ambitious and will also show Windows and Linux performance numbers to help those who may have users utilizing both OSes.
Key stats for the AMD Ryzen Threadripper 3970X: 32 cores / 64 threads with a 3.7GHz base clock and 4.5GHz turbo boost. There is 128MB of onboard last-level cache. The CPU features a 280W TDP. These are $1999 list price parts.
Here is what the lscpu output looks like for an AMD Ryzen Threadripper 3970X:
AMD is claiming 144MB of cache but it is important to remember this is really L2 + L3 cache. Still, if you compare the 128MB of L3 cache here in 8x 16MB segments, you get vastly more cache than top-end Intel SKUs like the Intel Xeon W-3275 28-core halo product which has only 38.5MB of L3 cache.
Since the 3rd generation, Ryzen Threadripper is using the AMD EPYC 7002 series “Rome” package as a base, it has features such as PCIe Gen4 and DDR4-3200 support. To give you a visual on how to think of 3rd gen Threadripper, consider it as a HEDT part with the ghost of EPYC 7002 series infused.
With the 3rd gen Threadripper platform, AMD has taken the leading socketed server part and unleashed it for desktops destined for creative professionals. At 32 cores, the AMD Ryzen Threadripper 3970X has more cores than the top commercially available socketed Intel Xeon CPU.
AMD TRX40 Platform
With the 3rd generation AMD Ryzen Threadripper family we get a new TRX40 platform. The TRX40 brings with it PCIe Gen4. That is a feature Intel lacks in this generation. The CPU to TRX40 interface has gone from a Gen3 x4 link to a Gen4 x8 link effectively quadrupling bandwidth to the chipset.
Realistically, while the platform’s quad-channel memory is more similar to Intel’s X299 chipset, the I/O capabilities are more like an upgraded version of the Xeon W-3200 series platforms like we saw in our Supermicro X11SPA-T motherboard review. PCIe Gen4 gives AMD a higher I/O bandwidth platform while the LGA3647 Intel chipset has additional memory channels and capacity.
Many commented on our previous articles, in our forums, and on the Internet, lamenting that the 3rd Generation Threadripper family needed new motherboards. Two points to address this concern. First, PCIe Gen4 requires higher-quality PCB materials, and that makes the transition a logical point to upgrade platforms. Second, the volume in this market buys a PC for office work, then upgrades it on an IT refresh cadence. They are not swapping CPUs into old systems. Given the choice between backward compatibility and game-changing new features, we take new features and moving the market forward.
Major Topology Overhaul
AMD has offered 32-core workstation parts before, specifically with the AMD Ryzen Threadripper 2990WX. Essentially based on the AMD EPYC 7001 “Naples” generation, the 2990WX is a four die/ NUMA node design. As you can see, the 2990WX has four NUMA nodes but only two have direct access to memory while the other two have to hop over Infinity Fabric to memory attached to a different die.
This topology worked, however, it probably would have been better if each die had access to a single memory channel in a 1+1+1+1 rather than a 2+0+2+0 quad-channel configuration. Some things were less than straightforward with this former topology.
With the new AMD Ryzen Threadripper 3970X, we see a more AMD EPYC 7002 “Rome” series-like topology. You can compare the below to our AMD EPYC 7502P Review as an example.
With the new I/O die configuration, more or less taken from the EPYC side, one gets four DDR4 channels that connect to the I/O die. The I/O die also has PCIe lanes and the x86 core dies attached to it. As a result, we get something that most OSes see as a single NUMA node. PCIe roots for CPU attached lanes all terminate at the same I/O die as well.
We are releasing a Core i9-10980XE review at the same time as this review. On the Intel side, pictures like the above have been the company’s standard. AMD is back to this design which is helping it garner wins in the server space because it minimizes some of the strange behavior we saw with chips like the first and second-generation Threadripper parts.
Here is the test configuration we used for the Ryzen Threadripper 3970X:
- Motherboard: MSI Creator TRX40
- CPU: AMD Ryzen Threadripper 3970X 32-core
- GPU: NVIDIA GeForce RTX 2080 SUPER
- Cooling: Noctua NH-U14S TR4-SP3
- RAM: 4x Corsair 16GB DDR4-3200 UDIMM (64GB Total)
- SSD: Samsung PM961 1TB
- OS: Windows 10 Pro Workstation
As a quick note here. The retail packaging comes with a case badge which is nice, but two more important bits. First, one gets a torque driver that helps one secure the chip into the socket. Second, one gets a water-cooling adapter ring.
The new 3rd Generation AMD Ryzen Threadripper family shares a lot with the AMD EPYC so if you use the Threadripper tool it will work on EPYC sockets as well. While the sockets are different, the physical latching mechanism is very similar.
For our CPU we will be using an AMD Ryzen Threadripper 3970X (32 core/64 thread) that you can see in the CPU-Z shot here:
The AMD Ryzen Threadripper 3970X is a very capable CPU, with turbo speeds that can reach up to 4.5GHz.
Let us continue with Windows performance testing.
I believe the header of the Power Load Test should read 12V, not 120V.
With regard to PCIe I/O has an interesting note on EPYCs. Some thing called “Preferred IO Device”:
I haven’t been able to figure what that is about or wether it’s only a Mellanox thing. Wonder if Threadripper has the same.
Really a shame about those RDIMMs. For this reason I’m going to have to get an EPYC at lower clocks for a workstation I’ll be getting next year instead of a TR. It’s a shame, really.
Totally agree about the platform thing. I’m not switching out CPUs in $6000+ computers.
How were the CPU temps with the noctua-nh-u14s-tr4-sp3? I am surprised that an air cooler could handle this monster!
Any tests that showcase performance for single threaded math heavy operations? I had to dump a previous threadripper built because it hugely lagged behind Intel CPUs mostly due to the absence of AVX2. Since then I have never touched AMD ever again. Am happy to revisit but I would like to see how it performs in single threads that require matrix computations and many millions of mathematical operations per second, ideally vectorized. Any such tests?
@John Lee Could you please make the textual output from lscpu available? I don’t want to be typing all these abbreviations by hand yet I want to see how many different features does it have compared to my trusty TR1920X. Thanks!
By the way, does anyone know what is the situation with encrypted memory main and encrypted memory for virtual machine with this generation of threadripper? The first generation showed support in the cpu flags but was missing something else from BIOS so it didn’t (wasn’t supposed to) work. It’s dick move by AMD to not support them on ThreadRipper, IMO, and I wonder if they kept it.
Thank you for a great review as always. I appreciate the inclusion of SPECworkstation, lots of programs there I use in the HPC world. I need to do some digging on my own to figure out how they build their tests though. Some of those programs are a mess of potential different libraries, MPI,BLAS,LAPACK,FFTW, etc.
Also I’d love to see some RandomX benchmarks like you did for Epyc. The 3970X should be perfect for it, I expect 25-30kh/s. While I’m asking, a deep dive on the cache would be interesting too, I’ve been seeing some results around online indicating there may be architectural differences in Zen2 Threadripper’s cache access vs Zen2 Ryzen.
Threadripper comes with an ECC caveat that’s if the Motherboard maker chooses to support it and then that ECC support is somewhat lacking compared to AMD’s Epyc branded SKUs. And the single socket Epyc P series of 7002 SKUs are still affordable with the MBs offering up more memory channels(8) and more PCIe lanes with the full vetting/certification for ECC memory types compared to any consumer Zen-2/MB based variants currently.
There are a few Benchmarks where the 3960X is performing on par or a little better than the 3970X and could that be the result of the 3 out of 4 enabled CPU cores on the 3960X’s CCX units still getting access to the same amount of L3 cache as the 4 enabled cores on the 3970’s CCX units where the 4 enabled cores have effectively less total L3 per CCX core to share among the enabled CPU cores than on the 3960X. I hope there will be more testing of the Cache subsystems on Zen-2 going forward for any SKUs that may have the full complement of L3 cache made available even though there is one, or more, core/cores pre CCX unit disabled and what workloads may benefit from having more total L3 Cache per enabled core on the CCX.
I’m really interested on seeing any testing done to confirm that for Zen-2 but Zen-3 will see AMD getting rid of the CCX construct altogether and making the CCD die/chiplet have its full Complement of L3 available to the full 8 cores instead of partitioning the CCD into 2 CCX Units. The big question for 8 cores per CCD and no CCX units besides less Infinity Fabric traffic needed to get at that larger shared pool of L3 cache on Zen-3’s CCD die/chiplet is will AMD switch to a Ring Bus configuration on the 8 core CCD or some more complicated topology for 8 cores versus the 4 cores/CCX construct that’s used currently.
Both AMD and Intel appear to be going wider order superscalar with their respective core designs in order to get more IPC in the face of getting less in performance advantages with the newer smaller process nodes not able to yield as much generational clock frequency increases as in the past. So Zen-3 will have to go wider order superscalar and maybe have some AVX512 options as well. I’d love to see AMD Bring some L4 cache to the I/O die at some point in time for any workloads that really can benefit but that’s maybe something that will have to wait for Zen-4 with hopefully Zen-3 getting some larger shared per CCD Die/Chiplet L3 cache over what Zen-2 offers.
Really the Epyc/SP3 motherboard warranty/support periods are much longer than any Consumer/Threadripper offerings and that has to factor in to TCO for any professional end users that can really also deduct Epyc’s higher up front costs as a business expense. And really as far as ECC CPU/MB partner support goes Epyc CPU/MBs are vetted/certified on all the professional software packages whereas Threadripper CPUs/MBs will have less testing/certification guarantees and less product support should that be needed from AMD and the SP3 Motherboard makers .
Threadripper may be sufficient for some if they absolutely need the higher clocks and are not dependent on ECC for certain workloads and maybe that’s good enough for some but folks need to do some more in depth cost/benefit analysis that also factors in the CPU’s cost/per memory channel and cost/per PCIe lane as well as the MB’s cost/memory channel and cost/PCIe lane. And that can make Epyc/SP3 the better deal on a cost/feature basis.
@Matt: You should check if it’s a fundamental issue or just Intel’s dirty tricks / lazy developers: https://old.reddit.com/r/matlab/comments/dxn38s/howto_force_matlab_to_use_a_fast_codepath_on_amd/
@Fabian,what has this to do with dirty tricks? Fact is that my math/linear algebra heavy programs on Intel CPUs ran circles around both the previous gen Threadripper and Epyc CPUs at otherwise identical frequencies and memory speeds. I could not care less what “games” anyone is playing when my back tests and other heavy math procedures finish in half the time on one CPU vs the other. I have been a very heavy amd critic for math heavy applications and voice such on this website multiple times. Am always happy to revisit to test new amd products but so far neither Epyc nor Threadripper came even close in performance to Intel’s cpu for math heavy applications.
@matt what fabian pointed to is that if you simply force matlab to properly recognize the math abilities of the AMD CPU it will run many more circles around the intel chips… the amd cups are faster on anything except a few avx512 special cases, so if you dont see that good chance it’s your math library that is heavily under utilizing the AMD chip. Nothing to criticize amd for, they cant fix your code for you.