AMD EPYC 7702P Benchmarks
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.
At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
We are going to show off a few results, and highlight a number of interesting data points in this article.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
While doing this review, I realized that we already had excellent charts for the AMD EPYC 7702P that we used in our initial launch piece. These charts have the Intel Xeon E5/ E7 V4 generation in yellow, the Xeon Scalable first generation in gray, and the 2nd generation in black. The first generation AMD EPYC is in blue while second-generation parts are in green which matches the carriers that the AMD parts use in each generation.
In these charts, we had a large number of dual-socket results. We also have single-socket results. You are going to see the AMD EPYC 7702P as a green bar as the CPU in the region of “Dual” results, without the “Dual” label. Here you can see that it is just about at the level of the dual Intel Xeon Platinum 8260 configuration.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
We added in the Marvell ThunderX2 numbers here. Here, single-socket AMD EPYC 7702P is out-performing dual 32-core ThunderX2 CPUs. To the Arm server ecosystem, they need to move to next-gen Arm Neoverse N1 cores in volume production to get competitive. If one just needs an Intel alternative, AMD has a strong contender at a lower price per performance.
Looking at single-socket results, we can see that the AMD EPYC 7742 single-socket configuration is very strong. There is a benefit to the higher clock speeds. Still, when we say that these 64-core parts are in a different class than competitive single-socket offerings, this is a great example of the step function.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Again, thanks to a huge core count, large and fast caches, one gets high-end 64-core performance that outpaces dual 28-core Xeon Scalable CPUs here.
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. With GROMACS we have been working hard to support AVX-512 and AVX2 supporting AMD Zen architecture. Here are the comparison results for the legacy data set:
Here we are generally seeing AMD’s per-core performance rival Intel’s when we are not using AVX2 and AVX-512. The AMD EPYC 7702P has more than twice the number of cores per socket.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
Here the dual Intel Xeon Platinum 8280 system is faster than the single socket AMD EPYC 7702P. Saying that two $10,000 list price CPUs (over $20,000 total) are slightly faster than a $4,425 list price CPU still does not feel like a win.
We also wanted to draw attention to the dual Intel Xeon E5-2699 V3 (brown) results. Here, the single AMD EPYC 7702P is faster than four of the top-end Xeon E5 V3 chips from Q3 2014. If you are replacing five-year-old Xeon E5’s you should get a 4:1 or better socket consolidation ratio with the 64-core EPYC parts. That type of consolidation has enormous TCO implications.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results:
GROMACS STH Small AVX2/ AVX-512 Enabled
During our initial benchmarking efforts, we have found that our version of GROMACS was taking advantage of AVX-512 on Intel CPUs. We also found that it was not taking proper advantage of the AMD EPYC 7002 architecture. From our original AMD EPYC 7002 Series Rome Delivers a Knockout piece:
We have had one of the lead developers on our dual AMD EPYC 7742 machine and changes are being upstreamed. The initial results were putting dual AMD EPYC 7742’s at around 2.7x of dual Intel Xeon Gold 6148F parts which are a go-to HPC chip. The above will change significantly once this is changed, but not in time for this AMD EPYC 7702P review.
Instead of continuing to publish this benchmark, we are going to hold off until later in 2019 when those results get upstreamed. At worst, as shown above, the chips are about even. When properly optimized, they are well ahead of Intel’s offerings.
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
Here we can see that Intel Xeon Platinum 8180 and Platinum 8280 parts pull ahead by a nice margin. Still, when we look at the single socket results, we are seeing an enormous Intel to AMD delta.
Also, looking at the top-end dual Xeon E5-2699 V4 chips, one can see the AMD EPYC 7702P is higher than a 2:1 consolidation ratio here. If you are on a three-year replacement cycle, you can consolidate at better than a 2:1 ratio throughout the range. That kind of a number is not available with current generation Xeons.
The first workload we wanted to look at is SPECrate2017_int_base performance. Specifically, we wanted to show the difference between what we get with Intel Xeon icc and AMD EPYC AOCC results. We expect server vendors get better results than we do, but this gives you an idea of where we are at:
Overall, SPECrate2017_int_base results show a consistent pattern to what we have seen elsewhere. The AMD EPYC 7702P is fast with a massive 256MB L3 cache and 128 threads.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
AMD simply has an awesome platform for virtualization as one can more effectively utilize bigger pools of RAM and cores per NUMA node. We think this is going to be an extremely strong sales point for the AMD EPYC 7702P series, especially for those who are paying for VMware vSphere solutions in a per-socket licensing model.
What the AMD EPYC 7702P does not have is Intel Optane DC Persistent Memory support. If one wants persistent memory, Intel is the best option around.
Next, we are going to look at the AMD EPYC 7702P market positioning before we get to our final words.