Advertisement


Home Workstation Workstation Processors AMD Ryzen 9 9950X3D2 Dual Edition Review: Going A Bit Higher

AMD Ryzen 9 9950X3D2 Dual Edition Review: Going A Bit Higher

0

Ryzen 9 9950X3D2 Performance

Before we jump into our own results, here is AMD’s official summary slide for the performance of the 9950X3D2.

AMD Ryzen 9 9950X3D2 Performance Expectations
AMD Ryzen 9 9950X3D2 Official Performance Expectations

The long and short of matters is that AMD is only touting very modest gains: a 7% average improvement in rendering in specific applications, and a 3% gain in average productivity performance. While the additional cache does bring some benefits, it will not be the same kind of performance uplift as the original X3D parts, and AMD is setting expectations accordingly.

With that out of the way, for our testing, we have a bit of a smorgasbord of different chips. STH does not have an exhaustive collection of desktop CPUs tested with the legacy methodology, as we are swapping to the new benchmark suite with Ubuntu 26.04 LTS later this week. In particular, we do not have any of Intel’s Core Ultra 200 series desktop chips, so this will not be a comprehensive comparison to Intel’s wares. We do have a decent collection of AMD’s desktop chips, however, including the original 9950X, 9950X3D, and now the 9950X3D2. So we can directly see how performance has evolved for AMD’s top chip with the addition of one, and now two, stacks of L3 V-cache.

Linux Kernel Compile Benchmark

First up, we have a fairly standard Linux kernel compilation benchmark.

AMD Ryzen 9 9950X3D2 Linux Kernel Compile Benchmarks
AMD Ryzen 9 9950X3D2 Linux Kernel Compile Benchmarks

Code compilation is one of those tasks that is reasonably parallel (at least until linking), which means that it has room to scale without creating too much cross-CCD traffic. The end result is that the 9950X3D manages to improve upon the X3D chip by a few percent – and it a few percent over the original 9950X – going to show some of the productivity gains afforded by the additional cache and power.

7-Zip Compression Benchmark

On the compression side, we wanted to see how the new chip would fare compared to some other options.

AMD Ryzen 9 9950X3D2 7zip Compression Benchmarks
AMD Ryzen 9 9950X3D2 7zip Compression Benchmarks

It is a very similar story to our 7-Zip compression benchmark. The extra cache does not make a massive degree of difference, but it does help enough to not just offset the small drop in peak clock speeds, but push the 9950X3D2 ahead of the X3D by a few percent.

In some respects, this is a more remarkable performance improvement because this benchmark as a whole has only modestly improved from Zen architecture IPC gains. This goes to show the importance of data flow to processor performance and why adding cache can sometimes be the most effective way to improve performance.

OpenSSL Benchmarks

We are going to put the OpenSSL numbers up with many of the lower-end systems we test just to give some sense of scale.

AMD Ryzen 9 9950X3D2 OpenSSL Sign Benchmarks
AMD Ryzen 9 9950X3D2 OpenSSL Sign Benchmarks

Here are the verify results:

AMD Ryzen 9 9950X3D2 OpenSSL Verify Benchmarks
AMD Ryzen 9 9950X3D2 OpenSSL Verify Benchmarks

The extra cache and power do help the 9950X3D2 push ahead in both signing and verifying. Though neither to a significant degree.

AgentSTH V6 Results Preview

Something we discussed is a new Agentic AI benchmark focused on how well CPUs perform on the agent part of workloads, rather than the LLM part, which often runs on GPUs. It turns out that Agentic AI CPU workloads often mirror much of what we see in more traditional workloads, so as we have been overhauling our suite, we wanted to modernize as well.

For those who do not want to see AI, tasks like compression are still very relevant to general-purpose computing use. What we did, however, was to profile a number of different Agentic AI workloads to get a mix for the composite score.

Also, and this is important, we are splitting up tasks. On modern CPUs with hundreds of cores, having a single task on a single core constantly stalling over 100 cores is not ideal. Realistically, today’s CPUs run containers, sandboxes, and virtual machines to use a single server to service multiple workloads simultaneously. So, we are moving to an era where we will look at a number of different CPU splits to see how it handles those simultaneous workloads and how they scale. That means we are not running a suite of benchmarks across a CPU once. Instead, we are now running different workload configurations on the CPU, for example.

Also, we are going to land the public benchmark on Ubuntu 26.04 LTS since we have found a few instances where Linux 7.0 makes a notable difference, and we are days away from the new OS release. Instead of an incremental upgrade for STH, this is a complete overhaul, modernized and written in Rust, and targeted at a modern LTS release.

Staring out, 1 Agent is running one instance of the suite across the entire socket, whereas 2 and 4 Agents are views of splitting the task up to multiple simultaneous agents running. We have a lot more split data on these, but this is just a high-level view of the difference.

9950X3D2 Summary_multi_agent_solo
9950X3D2 Summary_multi_agent_solo

These results are all normalized to running the benchmark on a single core of that machine. The reason the 4 Agents look much better than the 1 Agent is because of the issue of stalling an entire chip for a single core waiting to finish. The key lesson here is that running a single workload across even a 12 or 16-core CPU is relatively less efficient these days. Easier said, the total performance of the chips increases as multiple agents are run instead of just a single agent.

9950X3D2 Summary_single_16c_solo
9950X3D2 Summary_single_16c_solo

One other one we wanted to look at is the performance of the chips at different core counts and a single agent. We have 16 cores here and a composite score for the CPU across all subtests. We are using those 16 cores just as a standard here to make it easier to compare.

9950X3D2 Throughput_subtests_solo
9950X3D2 Throughput_subtests_solo

Along with total throughput, we also have a set of tests that are more focused on coordination tasks between agents.

9950X3D2 Coordination_subtests_solo
9950X3D2 Coordination_subtests_solo

Memory may be the most interesting score here, given the 9950X3D2’s specs. So it will be interesting to eventually see how it compares to other processors without as much L3 cache, and consequently, more pressure on the rest of their memory subsystems.

9950X3D2 Memory_subtests_solo
9950X3D2 Memory_subtests_solo

Overall, this is something we have been working on for some time in collaboration with one of the hyper-scale cloud providers’ performance gurus. Our observation a few years ago (over BBQ) was that CPU benchmarking often runs one process per core or per system, but modern cloud CPUs run multiple simultaneous workloads. This is just the first step in getting there. The goal is to turn this into something that we can release for folks so they can run easily. We are also exploring just doing a Geekbench-style distribution and providing pre-built binaries to make it easy for folks.

Let us keep going with Cinebench and Geekbench.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.