AMD EPYC 7371 Benchmarks
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.
This is an extraordinary result. Our Linux kernel compile benchmark is not just multi-threaded performance bound. One can see the single threaded performance and even memory bandwidth impact performance.
Here we can see the AMD EPYC 7371 not only passes the Intel Xeon Gold 6130 by a significant margin, but it also enters the AMD EPYC 24-core performance levels. This is a great example regarding why clock speed matters.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
Here you and see that the AMD EPYC 7371 puts a fairly significant delta between itself and the AMD EPYC 7351P based on clock speed. As a benchmark, c-ray is highly sensitive to core counts, clock speeds, and cache differences. We do not like using it across different architectures from the same vendor or from multiple vendors as it exaggerates performance gains. When AMD showed off the first EPYC “Rome” generation demo using c-ray versus Intel Xeon Scalable, this was done to show off what you are seeing above. AMD has an architectural advantage here.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Our custom has been sorting this chart based on decompression MIPS. In this generation, that favors the AMD EPYC over Intel Xeon Scalable results so we ask our readers to take a critical look at this one.
One point that the chart draws out is that this is a high-speed 16 core per socket part. The Intel Xeon Gold 6134 and Gold 6144 (we do not have the 6144’s) are very fast chips. If you had a 16 core per machine limit, then these would certainly be in the running as you can see when we brought in the dual Xeon Gold 6134 results above. The Intel Xeon Gold 6130 cannot keep up with the AMD EPYC 7371 here which points to Intel’s biggest competition to the AMD EPYC 7371 may actually be a dual socket 8-core per CPU server.
Intellectually, that statement should give you pause. Is a dual Intel Xeon 8-core server the same as a single core AMD EPYC 7371? Extrapolating is a quad Intel Xeon 8-core server equivalent
to a dual AMD EPYC 7371 server? These are questions that the AMD EPYC 7371 raises that we did not have to ask when we could look to Intel’s frequency optimized parts and AMD’s prior lack thereof and dismiss the notion out of hand. AMD is forcing enterprises to start asking these questions by delivering a solid frequency optimized CPU.
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. We are going to augment this with GROMACS in the next-generation Linux-Bench in the near future. With GROMACS we have been working hard to support Intel’s Skylake AVX-512 and AVX2 supporting AMD Zen architecture. Here are the comparison results for the legacy data set:
As we are going to see in our AVX2/ AVX-512 GROMACS results, that has a big impact on molecular simulation. At the same time, when we look at raw compute performance, the AMD EPYC 7371 can open up a big gap over the Intel Xeon Gold 6130 due to higher base clock frequencies and raw compute performance. The dual Intel Xeon Gold 6134 solution is very competitive but again compares dual socket frequency optimized Intel to single socket frequency optimized AMD.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing. Here we are going to use the single threaded test.
One way to look at this chart is that AMD is dramatically better than Intel. That is not a great way to interpret it. This is one of those cases where AMD has an architectural advantage that skews results. Instead, we wanted to focus on clock speed for a second. One can see that the AMD EPYC 7371 provides a significant jump in single thread clock speed over previous AMD EPYC chips. At the same time, one can see that the Intel Xeon Gold 6100 series, generally top out around 3.7GHz, save for a few SKUs while the Intel Xeon Gold 5100 and Xeon Silver lines have lower single-thread performance.
The trick to the AMD EPYC 7371 performance is not just high single-core speeds. Instead, it is maintaining higher clocks throughout the range of cores and threads used.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
When it comes to 16 core CPUs, the AMD EPYC 7371 is the top of our list. We do not have the Intel Xeon Gold 6142 which has a 500MHz higher base clock but the same maximum turbo clock speed as the Gold 6130. With that, we think Intel would squeeze out a victory over the AMD EPYC 7371 on our OpenSSL benchmarks.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone numbers:
Here the AMD EPYC 7371 is very competitive with dual Intel Xeon 6134 processors. In single threaded dhrystone 2, the figures are within the margin we use to describe a dead heat due to testing variations.
We wanted to take a second for those who are wondering Gold 6130 v. dual Gold 6134 and why they have the same maximum turbo clocks yet they are so far in performance on these tests. The base clock and all core turbo clocks on the Gold 6134 are higher. While one gets similar single-threaded performance, the dual Gold 6134 CPUs are able to maintain higher turbo clocks over more cores. That is what also makes the AMD EPYC 7371 performance so interesting with a 700MHz clock increase over the EPYC 7351(P) CPUs.
GROMACS STH Small AVX2/ AVX-512 Enabled
We have a small GROMACS molecule simulation we previewed in the first AMD EPYC 7601 Linux benchmarks piece. In Linux-Bench2 we are using a “small” test for single and dual socket capable machines. Our medium test is more appropriate for higher-end dual and quad socket machines. Our GROMACS test will use the AVX-512 and AVX2 extensions if available.
With dual port FMA AVX-512, the Intel Xeon Gold 6130 and the rest of the Intel Xeon Gold 6100 and Platinum 8100 lines are monsters in this test. At the same time, we wanted to show that when you go below the Gold 6100 line, the AMD EPYC 7371 with AVX2 is quite competitive. Intel defeatures its mainstream Gold 5100 and Silver 4100 parts both in terms of clock speed and AVX-512 performance. The AMD EPYC 7371 will not see competition below the Intel Xeon Gold 6100 line in this generation.
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
Here one can again see the gap is closed in the step function that formerly defined the gap between AMD’s 16 core and 24 core CPUs. The AMD EPYC 7371 bridges this gap and puts the Intel Xeon Gold 6130, one of Intel’s competitive higher clocked 16 core offerings, well behind.
Since this is a more significant piece than most, we wanted to give a little more color aside from our standard tests so we are going to have some bonus workloads before moving into our tests.