AMD EPYC 7402P Benchmarks
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.
At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
We are going to show off a few results, and highlight a number of interesting data points in this article.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
The performance of the AMD EPYC 7402P is easy to plot among other AMD EPYC offerings, but it is difficult to plot against Intel Xeon CPUs. What is the comparison we should use? The Intel Xeon Platinum 8260 is perhaps the most directly competitive. It is a $4700 part, but Intel now sells a variant called the Intel Xeon Gold 6212U. The Intel Xeon Gold 6212U has the same features as the Platinum 8260, but it is limited to single-socket operation and costs around $2000. You can read about that part’s genesis in: Intel Xeon Follows AMD EPYC Lead Offering Discounted 1P Only SKUs. The Xeon Platinum 8280 is everywhere while the Gold 6212U is harder to purchase usually as a special order part.
We could use dual CPUs, but that means Intel is disadvantaged with two NUMA nodes instead of one like AMD. It also will generally mean Intel is using significantly more power and will cost more. The upside is that it will have closer to the amount of AMD EPYC 7402P I/O, but still will fall a bit short.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
This is not the best cross-architecture benchmark, however, we have been using it for a long time and it scales well up to hundreds of threads. AMD tends to do well here because this is taking advantage of EPYC’s low level caching.
When the AMD EPYC 7401P was the new chip on the market, we assumed that the architectural improvements were responsible for its outsized gains versus Intel. Our hypothesis was that we would see the same general performance on the AMD EPYC 7402P but perhaps a few percent faster. Instead, we are seeing what would have been a ~30 core hypothetical AMD EPYC 7001 part here with the AMD EPYC 7402P.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Here we brought in some both Intel Xeon Silver as well as Intel Xeon D results. There is a segment of the market that relies upon many small nodes in clusters. If you need a compact form factor, the AMD EPYC 7402P does not make sense on a node-for-node basis. The AMD EPYC 3000 series is your best bet there. See AMD EPYC 3000 Coverage From STH Your Guide for our comprehensive coverage.
Still, we are at the point here where the single socket AMD EPYC 7402P will replace four Intel Xeon D-2141I nodes at lower initial cost and power consumption. Even if memory capacity is the concern, the AMD EPYC 7402P supporting up to 4TB means AMD can consolidate entire clusters of these smaller nodes into a single machine. That has trickle-down impacts of requiring fewer network switch ports and PDU ports increasing TCO benefits.
If you have an office or home lab, or a small colocation cluster that is using 250-300W in multiple nodes, consolidating to a single AMD EPYC 7402P may make sense.
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. With GROMACS we have been working hard to support AVX-512 and AVX2 supporting AMD Zen architecture. The Zen2 version for the AMD EPYC 7002P series is not complete yet, (more on that later) so this is actually a better comparison point right now. Here are the comparison results for the legacy data set:
In this chart, we pulled in a dual Intel Xeon E5-2690 V4 result. These were $2000 each upper mid-range 14 core and higher clock speed CPUs that were the current generation only about 18 months prior to this review. In two years, AMD has delivered $4000 of dual-socket CPU power for around one third the price, in a single socket, with more PCIe I/O, PCIe bandwidth, RAM capacity, and with lower power consumption.
That is a key reason the AMD EPYC 7002 series is game-changing. Instead of delivering incremental performance, AMD is delivering such value that the TCO calculations, even at their lower-end, may make sense to see 1 or 2 year old systems ripped and replaced to get the operational savings. At a minimum, it makes those platforms harder to re-order.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
Again we find the AMD EPYC 7402P trading performance core-for-core with the Intel Xeon Platinum 8200 line. With the new and higher clock speeds of the AMD EPYC 7402P, combined with the larger caches, AMD has closed the per-core performance gap that existed in the previous generations.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whestone results:
Again we see the AMD EPYC 7402P fall somewhere between the Intel Xeon Platinum 8268 and Platinum 8260. The AMD EPYC 7402P cannot claim that it is the fastest 24-core part. The base models of those chips for Intel are around $4700 to $6300, without accounting for the additional I/O (2nd CPU needed) or additional memory capacity (M or L SKU needed.) Our sense is that a $1250 CPU providing the same number of cores and per-core performance that is in the $5000-5500 range is still a compelling story for AMD. AMD may not be able to match the Platinum 8268, but also uses less power, has more memory capacity and I/O so it is not an outright win for the Platinum 8268.
GROMACS STH Small AVX2/ AVX-512 Enabled
During our initial benchmarking efforts, we have found that our version of GROMACS was taking advantage of AVX-512 on Intel CPUs. We also found that it was not taking proper advantage of the AMD EPYC 7002 architecture. From our original AMD EPYC 7002 Series Rome Delivers a Knockout piece:
We have had one of the lead developers on our dual AMD EPYC 7742 machine and changes are being upstreamed. The initial results were putting dual AMD EPYC 7742’s at around 2.7x of dual Intel Xeon Gold 6148F parts which are a go-to HPC chip. The above will shift significantly once this is changed, but not in time for this AMD EPYC 7402P review.
Instead of continuing to publish this benchmark, we are going to hold off until later in 2019 when those results get upstreamed. At worst, as shown above, the chips are about even. When properly optimized, they are well ahead of Intel’s offerings at a given price point.
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
We added dual Intel Xeon E5-2697 V3 processors to this mix to go back to a 3-5 year replacement cycle. Here, the AMD EPYC 7402P is able to nearly match a virtually top-end dual-socket system from the Intel Xeon E5-2600 V3 era. This is what bringing competiting back to the market looks like.
We also wanted to point out that while Intel delivered around 30% more performance at a given price point with the 2nd generation Intel Xeon Scalable mid-range CPUs (e.g. Silver 4200, Gold 5200, and Gold 623x), they did so largely by adding cores. You can see our Intel Xeon 6230 review as a great example of this. AMD’s new architecture has yielded a similar gain at the same core count which you can see by comparing the EPYC 7401P and the AMD EPYC 7402P.
The first workload we wanted to look at is SPECrate2017_int_base performance. Specifically, we wanted to show the difference between what we get with Intel Xeon icc and AMD EPYC AOCC results. We expect server vendors get better results than we do, but this gives you an idea of where we are at:
We had this from our original AMD EPYC 7002 Series Rome Delivers a Knockout piece, but it had a number of the comparison points we wanted, including the EPYC 7402P. You can see it compares well even on industry-standard workloads used in RFPs such as SPECrate2017_int_base. Again, we urge you to check your vendor’s benchmarks for actual RFPs since OEMs get better numbers than we do.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
We wanted to show an example of where the AMD EPYC 7402P was not on top. Here, thre are two drivers. One is that the Intel systems are dual socket with more cores. Second, the 8x 32GB (256GB) configuration was the lowest we tested. The key takeaway here is that a single AMD EPYC 7402P system with 256GB RAM is less than half the price of the dual Intel Xeon Platinum 8260 system, yet offers more than half the performance. In the world of vSAN, Ceph, and other software defined storage solutions used for hyper-converged clusters, adding more nodes with more storage and networking I/O can help the overall system TCO.
Next, we are going to look at the AMD EPYC 7402P market positioning and give our final words.