AMD EPYC 7401P Linux Benchmarks and Review – Something Special

9
AMD EPYC 7401P Cover
AMD EPYC 7401P Cover

Today we have something that we have been waiting to share for some time, a special review of the 24-core AMD EPYC 7401P in single socket configuration. The AMD EPYC 7401P offers a unique value that is largely unprecedented in the server CPU space: pricing under $45 per core and $23 per thread on higher core count parts. Beyond that, the CPU delivers 24 cores and 48 threads to a single socket at a price point under $1100, or about what Intel will sell you a 12 core Skylake-SP part for.

Let us be clear if you are putting an EPYC 7401 series in a single-socket only server, get the AMD EPYC 7401P not the 7401. AMD has extraordinarily aggressive pricing on the EPYC 7401P. If you are buying a dual socket server with a single CPU, for future dual CPU operation after an upgrade, then the EPYC 7401 may make better sense.

Key stats for the AMD EPYC 7401P (and EPYC 7401): 24 cores / 48 threads, 2.0GHz base and 3.0GHz turbo with a whopping 64MB L3 cache. The CPU features a 170W TDP. Here is the AMD product page with the feature set. Here is the lscpu output for the processor:

AMD EPYC 7401 Lscpu
AMD EPYC 7401 Lscpu

As part of our work in the sub $1100 CPU space, we have now benchmarked every Intel Xeon Silver and Bronze CPU, and all dual socket Intel combinations sub $2000. We have also completed work with every AMD EPYC performance variant from the EPYC 7251 to the EPYC 7601. This review is part of a massive project to deliver a complete set of results for STH readers, our data subscribers and for companies we are advising. During this review, we will be discussing comparisons between different options. These discussions are informed by running 6kW of systems constantly for months across a variety of workloads, including many we do not publish on STH.

Test Configuration

We are using the AMD EPYC 7401 for our benchmark numbers today. We validated the assumption with AMD that the EPYC 7401P and EPYC 7401 should be identical or nearly in terms of performance. We were told that the AMD EPYC 7401P may be negligibly faster due to having the dual socket capable circuitry disabled and therefore ever so slightly more power available for the rest of the package. We are going to label the results in our charts EPYC 7401P to make it clear that they are single socket results.

We now have every AMD EPYC SKU tested on a common Tyan EPYC platform and work started on another platform. Here is the base hardware configuration we are using:

  • CPU: AMD EPYC 7401
  • Server Barebones: Tyan Transport SX TN70A-B8026 (B8026T70AE24HR)
  • RAM: 8x 16GB 128GB DDR4-2666 RDIMMs (Samsung)
  • SSD: 1x Intel DC S3710 400GB SATA SSD
  • NIC: 1x Mellanox ConnectX-3 Pro EN VPI
Tyan Transport SX B8026T70AE24HR Front And Rear
Tyan Transport SX B8026T70AE24HR Front And Rear

Key to this system is that it supports 24x NVMe U.2 NVMe SSDs without using Broadcom PLX PCIe expanders. That is 96 lanes of PCIe 3.0 directly from a single SKU. One of the key advantages AMD EPYC has is that a single EPYC CPU can use 128x PCIe lanes, the same number as the dual socket configuration. Tyan has responded to this opportunity by offering a single-socket system that can handle 24x NVMe drives plus have I/O available for 10/25/40/50/100GbE.

Tyan Transport SX B8026T70AE24HR Internal 1
Tyan Transport SX B8026T70AE24HR Internal 1

AMD and Tyan originally suggested that we use a Samsung SSD (as pictured), however, to aid in consistency, we are using our lab standard Intel DC S3710 400GB SSDs.

AMD EPYC 7401 In Tyan 24 Bay NVMe 2U
AMD EPYC 7401 In Tyan 24 Bay NVMe 2U

In our forthcoming system review, we will have data on every CPU from the AMD EPYC 7251 to the EPYC 7601 for those looking at different options. We are going to try to keep our comparisons as relevant as possible from a price/ performance standpoint but we will also bring in additional data points as needed.

AMD EPYC 7401P Benchmarks

For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.

AMD EPYC 7401P Linux Kernel Compile Benchmarks
AMD EPYC 7401P Linux Kernel Compile Benchmarks

Our Linux Kernel compile benchmark shows the performance of the multi-die architecture. Here, AMD’s nearest competition is the 20 core Intel Xeon Gold 6138 priced at over $2600 each. If you want to the CPU that is closest on price, that would be the Intel Silver 4116. When we say EPYC “P” parts deliver performance per dollar, this is a stark example. The particular workload will scale with cores but prefers fewer bigger die.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.

AMD EPYC 7401P C Ray Benchmark
AMD EPYC 7401P C Ray Benchmark

We cut down this comparison significantly from what we would normally use. What we are doing, and will do in all of our charts, is to show at least 16, 24 and 32 core EPYC configurations, both from single and dual socket configurations.

C-ray requires fast caches and does not push data across cores and Infinity Fabric or Mesh/ UPI often so AMD EPYC is particularly strong is this type of workload.

7-zip Compression Performance

7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

AMD EPYC 7401P 7 Zip Compression Benchmark
AMD EPYC 7401P 7 Zip Compression Benchmark

Overall great performance by the AMD EPYC 7401P. We did want to pause here and note the dual Intel Xeon Gold 6134 results that we have in these charts. The Intel Xeon Gold 6134 is a CPU that will be popular for per-core licensing workloads. It only has 8 cores/ 16 threads but has a large (for Intel) L3 cache structure as well as high clock speeds with a 3.2GHz base. While it is performing relatively close to the single socket AMD EPYC 7401P, it is a significantly costlier (from a hardware standpoint) setup. When one says Intel has parts optimized for per-core performance, we saw this as a good example to use.

NAMD Performance

NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. We are going to augment this with GROMACS in the next-generation Linux-Bench in the near future. With GROMACS we have been working hard to support Intel’s Skylake AVX-512 and AVX2 supporting AMD Zen architecture. Here are the comparison results for the legacy data set:

AMD EPYC 7401P NAMD Benchmark
AMD EPYC 7401P NAMD Benchmark

There is an enormous delta between the Intel Xeon Silver 4116 or dual Silver 4110’s that are about the same price as the AMD EPYC 7401P. The $1075 EPYC 7401P is competing for more in the realm of the Xeon Gold series here when we are not utilizing AVX-512. Our GROMACS results will show what happens when we utilize the dual FMA AVX-512 on the Xeon Gold 6100 series with a similar type of application.

Sysbench CPU test

Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.

AMD EPYC 7401P Sysbench CPU Benchmark
AMD EPYC 7401P Sysbench CPU Benchmark

Again, another solid performance and one that shows that the EPYC 7401P has a competitive case against dual Xeon Silver 4114.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

AMD EPYC 7401P OpenSSL Sign Benchmark
AMD EPYC 7401P OpenSSL Sign Benchmark

Here are the verify results:

AMD EPYC 7401P OpenSSL Verify Benchmark
AMD EPYC 7401P OpenSSL Verify Benchmark

Here we see the Intel Xeon Gold 6138 with 20 cores perform well but the competition again is in the $1500-2600 range for Intel’s CPUs against a $1075 AMD single socket part.

Overall, this is a great price/ performance showing for AMD.

UnixBench Dhrystone 2 and Whetstone Benchmarks

Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:

AMD EPYC 7401P UnixBench Dhrystone 2 Benchmark
AMD EPYC 7401P UnixBench Dhrystone 2 Benchmark

Here are the whetstone results:

AMD EPYC 7401P UnixBench Whetstone Benchmark
AMD EPYC 7401P UnixBench Whetstone Benchmark

We added dual Intel Xeon Silver 4116 results to this chart. The dual Silver 4116 combination is still finishing up longer test runs for the next few days, but we wanted to provide some data points as to where it would fall against the AMD EPYC 7401P. Two Silver 4116 chips have the same TDP and cost about 90% more than the EPYC 7401P.

We think these charts are a great validation that AMD’s single socket performance SKU strategy has merit. It also shows why we like the EPYC 7401P.

GROMACS STH Small AVX2/ AVX-512 Enabled

We have a small GROMACS molecule simulation we previewed in the first AMD EPYC 7601 Linux benchmarks piece. In Linux-Bench2 we are using a “small” test for single and dual socket capable machines. Our medium test is more appropriate for higher-end dual and quad socket machines. Our GROMACS test will use the AVX-512 and AVX2 extensions if available.

AMD EPYC 7401P GROMACS STH Small Benchmark
AMD EPYC 7401P GROMACS STH Small Benchmark

There are a few things to point out on this chart. First, against the Intel Xeon Silver 4100 (and Gold 5100 for that matter) parts that are in the price range of the AMD EPYC 7401P, the single socket SKU performs well. Without the second AVX-512 FMA, Intel simply does not reap the benefits.

Conversely, the dual Intel Xeon Gold 6134 setup only has 16 cores between the two CPUs. High clock speeds and dual AVX-512 FMA mean big numbers. If you have an AVX-512 heavy workload, get the Intel Xeon Gold 6100 series. Even with that, remember that the Xeon Gold 6132, while faster, is still a $2100 CPU or about twice the cost of the EPYC 7401P.

Chess Benchmarking

Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:

AMD EPYC 7401P Chess Benchmark
AMD EPYC 7401P Chess Benchmark

We decided to put something special in this chart: every EPYC. All 9 performance variants are represented in the chart as well as two of the EPYC dual socket combinations. When we put the dual EPYC 7601 in the chart the scale made everything else difficult to read. Still, every EPYC, every Xeon Silver, every price competitive Xeon Silver dual socket configuration, and the top-end Xeon D / Atom C3000 series all in the same chart.

A Note on Power Consumption

The other side of the equation is power consumption. The AMD EPYC 7401 is putting up some impressive benchmark numbers, but that does have an associated cost. Since in our EPYC 7351P piece we caused confusion by showing a peak power consumption value (peak does not equal 100% load), we are going to break this down more simply:

  • Idle: 79W
  • 70% Load: 217W
  • 100% Load: 268W
  • Peak: 344W

Figures were taken on our APC / Schneider Electric 208V PDU at 17.6C and 72% RH. Our testing window shown here had a +/- 0.3C and +/- 2% RH variance.

Overall, solid numbers. They are competitive with dual Intel Xeon Silver 4114 and 4116. A single Intel Xeon Silver 4116 tops out under 130W at 100% load so AMD is trading power consumption for performance. Intel simply does not have a performance-optimized part in the sub-$1200 CPU segment.

We are using the AMD EPYC 7401 chip in single socket configuration to simulate the AMD EPYC 7401P for performance number purposes. We were told there may be a slight variance in an EPYC 7401P from a power standpoint.

Market Positioning

To provide some perspective, in the same general price range on the Intel Xeon Scalable side is the 12 core/ 24 thread Intel Xeon Silver 4116 we recently benchmarked at $1002 or about $83 per core. By the time you move to the full Intel Xeon Scalable feature set, the Intel Xeon Gold 6138 (20 cores also 2.0GHz) is around $130 per core.

AMD EPYC 7401P v. Intel Xeon Silver

At STH, we have about 150 different CPU configurations in the lab and have been covering the server space for eight years. There are very few times when we are at a loss for a comparison. The Intel Xeon Silver 4116 is the CPU in the price bracket that is closest to the AMD EPYC 7401P. It has half the TDP and half the cores. Perhaps the closest comparison is really a dual Intel Xeon Silver 4116 configuration. With that one can get close to the number of cores, at about the same frequency. One gets more memory channels (albeit at slower speeds) and almost as many PCIe lanes/ SATA III ports. One does have to move to a more expensive dual socket motherboard and the CPU cost is about $900 more, however, it could be a good comparison.

Looking to single-socket Intel Xeon Scalable there is simply no answer. Intel charges significantly more for a CPU the more cores it has. Perhaps the closest CPU Intel has to the EPYC 7401P is the Intel Xeon Gold 6138 with 20 cores, the same 2.0GHz base clock but a much higher turbo boost speed along with dual FMA AVX-512.

AMD Aggressive Volume Play MSRP Base Clock And Cores Comparison
AMD Aggressive Volume Play MSRP Base Clock And Cores Comparison

When we map the increase in Intel v. AMD cost for adding more compute on a socket in the single socket market, one can see why the AMD part is priced so competitively and how AMD is changing the single socket game.

AMD EPYC 7401P v. AMD EPYC

Given the pricing, we like the AMD EPYC 7401P versus two AMD EPYC 7251s. In the single socket (P) stack, there is little competition from the dual socket parts. To us, the AMD EPYC 7551P has a strong value proposition as a 32 core part. The 7551P is a $2100 part, or about twice that of the 7401P. In a $10,000 (or more) server, that is a 10% increase in system price for essentially 40-50% more performance.

With that, we see the main competition within the EPYC line as the 7351P. We think that the AMD EPYC 7351P is a great part. At the same time, $325 for 8 more cores is an awesome deal. For the STH infrastructure, we are looking at the AMD EPYC 7401P.

Final Words

The AMD EPYC 7401P is awesome for single-socket servers. If you are thinking about single or dual Intel Xeon Silver 4116 CPUs, there is no question the AMD EPYC 7401P is a better value from a performance perspective. This value proposition is strong because AMD specifically targeted the market with the P variant. If Intel dropped Xeon Silver 4116 pricing to $700 a dual socket system would be a lot more competitive. As it stands, your CPU cost is $900+ more (over a $1075) to get somewhat competitive performance from Intel. The AMD EPYC 7601 is a beastly CPU and the EPYC 7351P is a great value. If we had to pick the most competitive part AMD has, it is the 7401P hands down. There is no competitive part in the market for what the AMD EPYC 7401P has to offer.

SHARE
Previous articleRancher 2.0 is Full Steam Ahead on Kubernetes
Next articleSynology DS418play 4-bay NAS Released
Patrick has been running STH since 2009 and covers a wide variety of SME, SMB, and SOHO IT topics. Patrick is a consultant in the technology industry and has worked with numerous large hardware and storage vendors in the Silicon Valley. The goal of STH is simply to help users find some information about server, storage and networking, building blocks. If you have any helpful information please feel free to post on the forums.

9 COMMENTS

  1. Great review. I don’t see how Intel can ignore this, but there will probably have to be a hit to the OEM channel before they act, and product lines change slowly.

    FYI, the Linux kernel compile chart has the Xeon Gold 6138 twice with two different results

  2. Thanks for the review. I think this processor will be used for the next video editing workstation I am going to build. With this workstation I will follow the GROMACS advice and use GPU power to do most of the heavy lifting.
    It’s good to see that the GROMACS website advices to use GPU’s to speed up the process. GPU’s are relatively cheap and fast for these kind of calculations. Blackmagicdesign Davinci Resolve uses the same philosophie, use GPU’s where possible. AMD EPYC has enough PCIe-lanes for the maximum of 4 GPU’s supported in Windows (Linux upto 8 GPU’s in 1 system).

  3. @Bill Broadley
    Davinci Resolve supports upto 4 GPU’s under windows (4x16PCIe-lanes). I will use 8 PCIe-lanes for the DecLink card(10 bit color grading) and 2×8 PCIe-lanes for 2 HighPoint SSD7101A in raid-0 for realtime rendering/editing etc (14GB/s sustained read and 12 GB/s sustaind write, 960 pro SSD’s will be 25% overprovisioned).

  4. I still want to test it in our environment before buying racks of them. This is still helpful. Maybe we’ll buy a few to try

  5. @Girish
    Have a look at the supermicro website (4 node in 2U).
    Maybe 1 node in 1 U is already fast enough.
    I see no reason why AMD EPYC wouldn’t work with OpenStack, AMD is one of the many companies supporting the OpenStack organizations (and they have the hardware to back it up).

LEAVE A REPLY

Please enter your comment!
Please enter your name here