AMD EPYC 7F52 Benchmarks Review and Market Perspective

9

AMD EPYC 7F52 Performance

For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable refresh benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.

At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.

We are going to show off a few results, and highlight a number of interesting data points in this article.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:

AMD EPYC 7F52 Linux Kernel Compile Benchmark
AMD EPYC 7F52 Linux Kernel Compile Benchmark

This result may both be surprising, but will also set a tone for the rest of the section. The AMD EPYC 7F52 is not going to be the lowest cost per unit of performance on these charts. If all we cared about was the hardware cost of a server, the AMD EPYC 7352 and EPYC 7402 are much better buys.

Instead, we are looking at per-core performance here which means we want to look at 16 core parts like the EPYC 7302. Here, there is a big gap, although nowhere near the size of the delta between the list price of the chips. That is the unique frequency optimized segment nuance in our evaluation.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.

AMD EPYC 7F52 C Ray 8K Benchmark
AMD EPYC 7F52 C Ray 8K Benchmark

Here we see another large gap open up over the EPYC 7302, the mainstream 16-core part. One can also look to the Xeon Gold 6242/ Gold 6226R for other 16 core mainstream parts to get a sense of how Intel Xeons fare.

7-zip Compression Performance

7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

AMD EPYC 7F52 7zip Compression Benchmark
AMD EPYC 7F52 7zip Compression Benchmark

Here we wanted to draw attention to the EPYC 7371 to EPYC 7F52 comparison. As revolutionary as the EPYC 7371 was when it was released, the EPYC 7F52 is a big improvement.

NAMD Performance

NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. With GROMACS we have been working hard to support AVX-512 and AVX2 architectures. Here are the comparison results for the legacy data set:

AMD EPYC 7F52 NAMD Benchmark
AMD EPYC 7F52 NAMD Benchmark

Here again, we see a very large gap open up over the other 16-core parts. The EPYC 7352 is a 24-core part, another quirk of the “Rome” naming conventions. The EPYC 7302 and 7352 may seem close, but the EPYC 7352 has 8 more cores. When the EPYC 7F52 gets close to the EPYC 7352 it is closing in on making up an 8 core delta.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

AMD EPYC 7F52 Open SSL Sign Benchmark
AMD EPYC 7F52 Open SSL Sign Benchmark

Here are the verify results:

AMD EPYC 7F52 Open SSL Verify Benchmark
AMD EPYC 7F52 Open SSL Verify Benchmark

Here Intel performs very well. The Gold 6226R is a lower-clocked part. When we see the gap between the Gold 6226R and the EPYC 7302 we get a sense of how this will look with the Gold 6246R.

UnixBench Dhrystone 2 and Whetstone Benchmarks

Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:

AMD EPYC 7F52 UnixBench Dhrystone 2 Benchmark
AMD EPYC 7F52 UnixBench Dhrystone 2 Benchmark

Here are the whetstone results:

AMD EPYC 7F52 UnixBench Whetstone Benchmark
AMD EPYC 7F52 UnixBench Whetstone Benchmark

Here we see performance of the EPYC 7F52 well above the Xeon Gold 6240. While the Gold 6240 is a pre-refresh part, it is also an 18 core part.

Chess Benchmarking

Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and now use the results in our mainstream reviews:

AMD EPYC 7F52 Chess Benchmark
AMD EPYC 7F52 Chess Benchmark

In this workload one can see the EPYC 7F52 put a large gap between it and some of the 16 and 24-core competitors. This is a workload that has less of a memory impact and is more impacted by cores and clock speeds.

STH STFB KVM Virtualization Testing

One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.

AMD EPYC 7F52 V Other EPYC 16 Core KVM STFB SLA Virtualization Testing
AMD EPYC 7F52 V Other EPYC 16 Core KVM STFB SLA Virtualization Testing

We first wanted to show the AMD EPYC 16-core lineup here using 512GB. At some of the larger VM sizes, memory capacity can become a limiting factor. As one can see, the constrained memory bandwidth and low cache levels on the AMD EPYC 7282 hurt that solution in this workload while the EPYC 7F52 shows why it is significantly faster than its stablemates.

AMD EPYC 7F52 KVM STFB SLA Virtualization Testing
AMD EPYC 7F52 KVM STFB SLA Virtualization Testing

Here we can see the EPYC 7F52 perform extremely well against a collection of its peers. The Large VMs tend to be very memory-constrained which is why we see the 256GB EPYC 7F52 configuration perform poorly. As we move to smaller VM memory footprints, the EPYC 7F52’s large caches help keep the VMs working at an acceptable rate. It cannot rival a 24-core high-end Xeon Scalable Refresh part like the Xeon Gold 6248R, but the exercise here is performance per core not necessarily absolute performance.

SPECrate2017_int_base

The last benchmark we wanted to look at is SPECrate2017_int_base performance. Specifically, we wanted to show the difference between what we get with Intel Xeon icc and AMD EPYC AOCC results.

Server vendors get better results than we do, but this gives you an idea of where we are at in terms of what we have seen:

AMD EPYC 7F52 SPECrate2017_int_base Benchmarks STH Tested Not Official
AMD EPYC 7F52 SPECrate2017_int_base Benchmarks STH Tested Not Official

A few quick and important notes here:

  • These are not vendor official results. For official results, see the official results browser.
  • We are about 1.5-2% behind where AMD is with their estimates in marketing materials. That is close enough that we think that AMD is likely near what server vendors will publish.
  • We did not have dual Intel Xeon Gold 6246R’s for testing. Instead, this is a figure published by Dell using the Dell EMC PowerEdge R740xd platform we reviewed. It is the only official result here and denoted with an (*).

This is perhaps the biggest metric server buyers use in the space and AMD is performing very well here which seems to match the results we saw in our test suite.

Next, we are going to get into the “so what” and discuss market positioning for the new products before giving our final words.

9 COMMENTS

  1. These are the SKUs that should be thought of as True Workstation class parts and the higher clocks are welcome there along with the memory capacity and the full ECC memory types support.

    So what about asking Dell and HP, and AMD, about any potential for that Graphics Workstation market segment on any 1P variants that may appear. I’m hoping that Techgage can get their hands on any 1P variants, or even 2P variants in a single socket compatible Epyc/SP3 Motherboard(With beefed up VRMs) for Graphics workloads testing.

  2. While in the 7F52 each core gets 16MB of L3 cache, that core has only 2048 4k page TLB entries which only cover 8MB. It would be interesting to see how much switching to huge pages improves performance.

  3. Your 16 core intel model is wrong.

    16 cores from a 28 core die could still be set up to have the full 28 cores worth of L3 Cache.

  4. Hi ActuallyWorkstationGrade, if you are going to convince HPE/Dell about 1CPU workstation then IMHO you will have hard fight as those parts are more or less Xeon W-22xx competitive which means if those makers already do have their W-22xx workstation, then Epyc workstation of the same performance will not bring them anything. Compare benchmark results with Xeon W-2295 review here on servethehome and you can see yourself.

  5. Sometimes I read STH for the what. In this “review” the what was nowhere near as interesting as the “why”. You’ve got a great grasp on market dynamics

  6. I have said it before and this release only highlights the need for it:
    You need to add some “few thread benchmarks” to your benchmark suite !
    You are only running benchmarks that scale perfectly and horizontally over all cores. This does not massage the turbo modes of the cores nor highlights the advantages of frequency optimized SKUs.

    In the real world, most complex environments are built with applications and integrations that absolutely do not scale well horizontally. They are most often limited by the performance and latency of a lot fewer than all cores. Please add some benchmarks that do not use all cores. You can use exisiting software and just limit the amount of threads. 4 – 8 threads would be perfectly realistic.

    As you correctly say, the trend with per core licensing will only make this more relevant over time. Best is to start benching as soon as possible so you can build up some comparison data in your database.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.