Ryzen 9 9950X3D2 Performance
Before we jump into our own results, here is AMD’s official summary slide for the performance of the 9950X3D2.

The long and short of matters is that AMD is only touting very modest gains: a 7% average improvement in rendering in specific applications, and a 3% gain in average productivity performance. While the additional cache does bring some benefits, it will not be the same kind of performance uplift as the original X3D parts, and AMD is setting expectations accordingly.
With that out of the way, for our testing, we have a bit of a smorgasbord of different chips. STH does not have an exhaustive collection of desktop CPUs tested with the legacy methodology, as we are swapping to the new benchmark suite with Ubuntu 26.04 LTS later this week. In particular, we do not have any of Intel’s Core Ultra 200 series desktop chips, so this will not be a comprehensive comparison to Intel’s wares. We do have a decent collection of AMD’s desktop chips, however, including the original 9950X, 9950X3D, and now the 9950X3D2. So we can directly see how performance has evolved for AMD’s top chip with the addition of one, and now two, stacks of L3 V-cache.
Linux Kernel Compile Benchmark
First up, we have a fairly standard Linux kernel compilation benchmark.

Code compilation is one of those tasks that is reasonably parallel (at least until linking), which means that it has room to scale without creating too much cross-CCD traffic. The end result is that the 9950X3D manages to improve upon the X3D chip by a few percent – and it a few percent over the original 9950X – going to show some of the productivity gains afforded by the additional cache and power.
7-Zip Compression Benchmark
On the compression side, we wanted to see how the new chip would fare compared to some other options.

It is a very similar story to our 7-Zip compression benchmark. The extra cache does not make a massive degree of difference, but it does help enough to not just offset the small drop in peak clock speeds, but push the 9950X3D2 ahead of the X3D by a few percent.
In some respects, this is a more remarkable performance improvement because this benchmark as a whole has only modestly improved from Zen architecture IPC gains. This goes to show the importance of data flow to processor performance and why adding cache can sometimes be the most effective way to improve performance.
OpenSSL Benchmarks
We are going to put the OpenSSL numbers up with many of the lower-end systems we test just to give some sense of scale.

Here are the verify results:

The extra cache and power do help the 9950X3D2 push ahead in both signing and verifying. Though neither to a significant degree.
AgentSTH V6 Results Preview
Something we discussed is a new Agentic AI benchmark focused on how well CPUs perform on the agent part of workloads, rather than the LLM part, which often runs on GPUs. It turns out that Agentic AI CPU workloads often mirror much of what we see in more traditional workloads, so as we have been overhauling our suite, we wanted to modernize as well.
For those who do not want to see AI, tasks like compression are still very relevant to general-purpose computing use. What we did, however, was to profile a number of different Agentic AI workloads to get a mix for the composite score.
Also, and this is important, we are splitting up tasks. On modern CPUs with hundreds of cores, having a single task on a single core constantly stalling over 100 cores is not ideal. Realistically, today’s CPUs run containers, sandboxes, and virtual machines to use a single server to service multiple workloads simultaneously. So, we are moving to an era where we will look at a number of different CPU splits to see how it handles those simultaneous workloads and how they scale. That means we are not running a suite of benchmarks across a CPU once. Instead, we are now running different workload configurations on the CPU, for example.
Also, we are going to land the public benchmark on Ubuntu 26.04 LTS since we have found a few instances where Linux 7.0 makes a notable difference, and we are days away from the new OS release. Instead of an incremental upgrade for STH, this is a complete overhaul, modernized and written in Rust, and targeted at a modern LTS release.
Staring out, 1 Agent is running one instance of the suite across the entire socket, whereas 2 and 4 Agents are views of splitting the task up to multiple simultaneous agents running. We have a lot more split data on these, but this is just a high-level view of the difference.

These results are all normalized to running the benchmark on a single core of that machine. The reason the 4 Agents look much better than the 1 Agent is because of the issue of stalling an entire chip for a single core waiting to finish. The key lesson here is that running a single workload across even a 12 or 16-core CPU is relatively less efficient these days. Easier said, the total performance of the chips increases as multiple agents are run instead of just a single agent.

One other one we wanted to look at is the performance of the chips at different core counts and a single agent. We have 16 cores here and a composite score for the CPU across all subtests. We are using those 16 cores just as a standard here to make it easier to compare.

Along with total throughput, we also have a set of tests that are more focused on coordination tasks between agents.

Memory may be the most interesting score here, given the 9950X3D2’s specs. So it will be interesting to eventually see how it compares to other processors without as much L3 cache, and consequently, more pressure on the rest of their memory subsystems.

Overall, this is something we have been working on for some time in collaboration with one of the hyper-scale cloud providers’ performance gurus. Our observation a few years ago (over BBQ) was that CPU benchmarking often runs one process per core or per system, but modern cloud CPUs run multiple simultaneous workloads. This is just the first step in getting there. The goal is to turn this into something that we can release for folks so they can run easily. We are also exploring just doing a Geekbench-style distribution and providing pre-built binaries to make it easy for folks.
Let us keep going with Cinebench and Geekbench.


