AMD EPYC 7702P Review Redefining Possible at 64C Per Socket

12

AMD EPYC 7702P Benchmarks

For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.

At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.

We are going to show off a few results, and highlight a number of interesting data points in this article.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:

AMD EPYC 7002 Linux Kernel Compile Benchmark Result
AMD EPYC 7002 Linux Kernel Compile Benchmark Result

While doing this review, I realized that we already had excellent charts for the AMD EPYC 7702P that we used in our initial launch piece. These charts have the Intel Xeon E5/ E7 V4 generation in yellow, the Xeon Scalable first generation in gray, and the 2nd generation in black. The first generation AMD EPYC is in blue while second-generation parts are in green which matches the carriers that the AMD parts use in each generation.

In these charts, we had a large number of dual-socket results. We also have single-socket results. You are going to see the AMD EPYC 7702P as a green bar as the CPU in the region of “Dual” results, without the “Dual” label. Here you can see that it is just about at the level of the dual Intel Xeon Platinum 8260 configuration.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.

AMD EPYC 7002 C Ray 8K Benchmarks
AMD EPYC 7002 C Ray 8K Benchmarks

We added in the Marvell ThunderX2 numbers here. Here, single-socket AMD EPYC 7702P is out-performing dual 32-core ThunderX2 CPUs. To the Arm server ecosystem, they need to move to next-gen Arm Neoverse N1 cores in volume production to get competitive. If one just needs an Intel alternative, AMD has a strong contender at a lower price per performance.

AMD EPYC 7002 C Ray 8K 1P Only Benchmarks
AMD EPYC 7002 C Ray 8K 1P Only Benchmarks

Looking at single-socket results, we can see that the AMD EPYC 7742 single-socket configuration is very strong. There is a benefit to the higher clock speeds. Still, when we say that these 64-core parts are in a different class than competitive single-socket offerings, this is a great example of the step function.

7-zip Compression Performance

7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

AMD EPYC 7002 7zip Compression Benchmarks
AMD EPYC 7002 7zip Compression Benchmarks

Again, thanks to a huge core count, large and fast caches, one gets high-end 64-core performance that outpaces dual 28-core Xeon Scalable CPUs here.

NAMD Performance

NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. With GROMACS we have been working hard to support AVX-512 and AVX2 supporting AMD Zen architecture. Here are the comparison results for the legacy data set:

AMD EPYC 7002 NAMD Benchmarks
AMD EPYC 7002 NAMD Benchmarks

Here we are generally seeing AMD’s per-core performance rival Intel’s when we are not using AVX2 and AVX-512. The AMD EPYC 7702P has more than twice the number of cores per socket.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

AMD EPYC 7002 OpenSSL Sign Benchmarks
AMD EPYC 7002 OpenSSL Sign Benchmarks

Here are the verify results:

AMD EPYC 7002 OpenSSL Verify Benchmarks
AMD EPYC 7002 OpenSSL Verify Benchmarks

Here the dual Intel Xeon Platinum 8280 system is faster than the single socket AMD EPYC 7702P. Saying that two $10,000 list price CPUs (over $20,000 total) are slightly faster than a $4,425 list price CPU still does not feel like a win.

We also wanted to draw attention to the dual Intel Xeon E5-2699 V3 (brown) results. Here, the single AMD EPYC 7702P is faster than four of the top-end Xeon E5 V3 chips from Q3 2014. If you are replacing five-year-old Xeon E5’s you should get a 4:1 or better socket consolidation ratio with the 64-core EPYC parts. That type of consolidation has enormous TCO implications.

UnixBench Dhrystone 2 and Whetstone Benchmarks

Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:

AMD EPYC 7002 UnixBench Dhrystone 2 Benchmark
AMD EPYC 7002 UnixBench Dhrystone 2 Benchmark

Here are the whetstone results:

AMD EPYC 7002 UnixBench Whetstone Benchmark
AMD EPYC 7002 UnixBench Whetstone Benchmark

Again we see great results faster than 64 cores of Marvell ThunderX2 or even first-generation AMD EPYC 7601‘s.

GROMACS STH Small AVX2/ AVX-512 Enabled

During our initial benchmarking efforts, we have found that our version of GROMACS was taking advantage of AVX-512 on Intel CPUs. We also found that it was not taking proper advantage of the AMD EPYC 7002 architecture. From our original AMD EPYC 7002 Series Rome Delivers a Knockout piece:

AMD EPYC 7002 GROMACS STH Small Case Not Zen2 Optimized Benchmark
AMD EPYC 7002 GROMACS STH Small Case Not Zen2 Optimized Benchmark

We have had one of the lead developers on our dual AMD EPYC 7742 machine and changes are being upstreamed. The initial results were putting dual AMD EPYC 7742’s at around 2.7x of dual Intel Xeon Gold 6148F parts which are a go-to HPC chip. The above will change significantly once this is changed, but not in time for this AMD EPYC 7702P review.

Instead of continuing to publish this benchmark, we are going to hold off until later in 2019 when those results get upstreamed. At worst, as shown above, the chips are about even. When properly optimized, they are well ahead of Intel’s offerings.

Chess Benchmarking

Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:

AMD EPYC 7002 Chess Benchmark
AMD EPYC 7002 Chess Benchmark

Here we can see that Intel Xeon Platinum 8180 and Platinum 8280 parts pull ahead by a nice margin. Still, when we look at the single socket results, we are seeing an enormous Intel to AMD delta.

Also, looking at the top-end dual Xeon E5-2699 V4 chips, one can see the AMD EPYC 7702P is higher than a 2:1 consolidation ratio here. If you are on a three-year replacement cycle, you can consolidate at better than a 2:1 ratio throughout the range. That kind of a number is not available with current generation Xeons.

SPECrate2017_int_base

The first workload we wanted to look at is SPECrate2017_int_base performance. Specifically, we wanted to show the difference between what we get with Intel Xeon icc and AMD EPYC AOCC results. We expect server vendors get better results than we do, but this gives you an idea of where we are at:

AMD EPYC 7002 SPECrate2017_int_base Benchmark
AMD EPYC 7002 SPECrate2017_int_base Benchmark

Overall, SPECrate2017_int_base results show a consistent pattern to what we have seen elsewhere. The AMD EPYC 7702P is fast with a massive 256MB L3 cache and 128 threads.

STH STFB KVM Virtualization Testing

One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.

AMD EPYC 7002 KVM STFB Virtualization Workload 1 Benchmark
AMD EPYC 7002 KVM STFB Virtualization Workload 1 Benchmark

AMD simply has an awesome platform for virtualization as one can more effectively utilize bigger pools of RAM and cores per NUMA node. We think this is going to be an extremely strong sales point for the AMD EPYC 7702P series, especially for those who are paying for VMware vSphere solutions in a per-socket licensing model.

What the AMD EPYC 7702P does not have is Intel Optane DC Persistent Memory support. If one wants persistent memory, Intel is the best option around.

Next, we are going to look at the AMD EPYC 7702P market positioning before we get to our final words.

12 COMMENTS

  1. While 7702P is excellent compared to all existing Xeon CPUs it’s not clear if/when it’s better than dual 7452 (which are cheaper, faster, has 2x RAM bandwidth and 2x RAM size limit) except under very significant power constraints (given existence of DP 2U4N servers from several vendors space doesn’t seem to be much of an issue) or if used with paid software licensed per socket.
    Any other common scenario when 7702P is preferable to dual 7452s?

  2. Igor that is something I do want to test (we do not have EPYC 7452’s yet.)

    Single socket you also have lower power consumption per core, and more PCIe lanes per core. You also do not have a socket-to-socket traversal. So there are advantages as well. 2U4N EPYC runs out of space for 16 DIMMs per CPU so you are limited to 16 DIMMs per node.

  3. @Igor,
    You are correct that 2x 7452 give you 2x RAM capacity+bandwidth, and CPU cycles (7452 has higher base clock — 2.9 vs. 2.2Ghz). And while 2×7452 has a smaller cost premium over the 7702 (2x$3,400 vs. $6,450). These are not the only things that need to be considered.

    1 – Power and Cooling costs. TDP of 7702=225w, TDP of 2×7452=450w.
    TDP == $$$, this can add up over time
    2 – System cost. Proportionately, a single CPU system is less expensive than dual CPU system.
    3 – Single CPU is more efficient than dual CPUs for some operations, no inter-CPU traffic.

  4. Dawkins jr – you can. We have this running in the lab and it worked out-of-the-box.

    BinkyTO – the 7702P is $4425 and the 7452’s are $2025 each. The 7542 is $3400 while the 7702 (non-P) is $6450. All list prices.

  5. Linus Tech Tips: “BRO 64 CORES THIS IS COOL”
    STH: “Here is the TCO benefit, what you can replace and why. Here are the market dynamics at play.”

    Thanks for doing something useful.

  6. The Playstation 3 actually has 512MB of RAM, split across two busses. The CELL BE can only directly access 256MB of it but the graphics and audio and some DMA can access all of it.

  7. Hi @Patrick,
    Why “… single socket you also have … more PCIe lanes per core”? It may be true for some pairs of 1-CPU vs 2-CPU motherboards (however in most cases it’s not so) but architecturally it’s the same 128 PCIe 4.0 lanes (or even more – supposedly configurable of up to 160 – lanes for 2-CPU configuration).
    Looking forward to test results of dual 7452s.

  8. This is another great review… But I still wonder why no one talks about the elephant in the room (not the Del C6525 or the Bullsequanna) but the fact that EPYC can now be 8P at one NUMA node per socket…

    It is interesting that with core counts most situation can’t use 8P but HPC will love it… And Rome is seemingly made for 2P sleds… 1024 threads in a 2U4N…?

  9. Great stuff, more nicely detailed information.

    [Although I think the paragraph following the chess graph isn’t reading as you intended.]

  10. And it seems pretty likely that dual 7402 (despite only having 48 cores) has similar or better performance than 7702P (if CPU2017_rate results are any indication)

  11. 64 cores is nice, but I do not have unlimited resources as a private person. So 16 or 24 cores EPYC WS seems the best value.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.