AMD EPYC 7002 Series Rome Delivers a Knockout

57

AMD EPYC 7002 CPU Performance

For this exercise, we are using a mix of our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.

At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.

We are going to show off a few results, and highlight a number of interesting data points in this article.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:

AMD EPYC 7002 Linux Kernel Compile Benchmark Result
AMD EPYC 7002 Linux Kernel Compile Benchmark Result

Probably the first two things that will stick out at you from this chart are the colors. We tried using one color per CPU generation throughout this series. The second item of note is that we did bring quad-socket Intel Xeon solutions into this chart. Frankly, we had to. Without the quad-socket solutions being added, Intel would have been too far behind.

This is perhaps one of the more interesting results we had. If you have a big CI/CD cluster, get AMD EPYC 7002 series CPUs and do not look back. The test itself is not fully multi-threaded, so there are segments that are limited by single-core performance where Intel does well. For example, the dual AMD EPYC 7601 top-bin figures were below many of the dual 2nd generation Intel Xeon Scalable configurations we tested. At the same time, massive caches, more memory bandwidth, and more cores mean that the AMD EPYC 7002 series does more than keep up. It is not just an Intel Xeon Platinum competitor. Instead, the question is whether AMD has a 2:1 consolidation ratio over Intel Xeon Platinum. These are astounding results.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.

AMD EPYC 7002 C Ray 8K Benchmarks
AMD EPYC 7002 C Ray 8K Benchmarks

Here, we added the 32 cores/ 128 thread each Marvell ThunderX2 CN9980 dual-socket configuration. The newer Intel Xeon Platinum dual-socket parts are able to compete well against these. AMD EPYC 7002 is on a different level. As you may be able to see with the dual AMD EPYC 7601 top-bin last generation results, AMD traditionally fares well with this. Here the AMD EPYC 7702P with 64 cores in a single socket at $4425 is ahead of dual Intel Xeon Platinum 8280 parts.

If we take a look at single-socket only benchmarks, here is what the picture looks like:

AMD EPYC 7002 C Ray 8K 1P Only Benchmarks
AMD EPYC 7002 C Ray 8K 1P Only Benchmarks

The AMD EPYC 7402P with 24 cores is priced as an Intel Xeon Gold 5218 competitor. With more cores, it is able to push ahead being over twice as fast. That is a big deal. The Intel Xeon Gold 6209U is a $1350 version of the Intel Xeon Gold 6230 CPU that is single socket only. Here, the AMD EPYC 7402P is about two thirds faster yet 10% less expensive.

As we are going to continue to see, 64 cores of AMD EPYC 7002 are just on a different level than Intel is playing at in socketed CPU designs.

7-zip Compression Performance

7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

AMD EPYC 7002 7zip Compression Benchmarks
AMD EPYC 7002 7zip Compression Benchmarks

Many cores, bigger caches, more memory bandwidth, and a newer microarchitecture all help AMD quite a bit here. Intel asked us to compare a similar number of cores to be fair to their chips. In that spirit, the dual AMD EPYC 7502 configuration has 32 cores each while the dual Intel Xeon Platinum 8280 only has 28 cores each. AMD does not have 28 core SKUs.

On the other hand, the dual AMD EPYC 7502 is faster and uses $2600 each list price SKUs while the Intel Xeon Platinum 8280 is a $10,007 list price SKU. AMD provides four more cores per socket which help performance, but they are doing so at an initial list price discount of around 74%. That is a big deal if you assume Intel Xeon Platinum list pricing is designed so server vendors can utilize large 60% discounts. AMD still has the better platform, but we think here the Xeon Platinum 8280 can be very competitive with an 80% discount off of list price.

NAMD Performance

NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. With GROMACS we have been working hard to support AVX-512 and AVX2 supporting AMD Zen architecture. Here are the comparison results for the legacy data set:

AMD EPYC 7002 NAMD Benchmarks
AMD EPYC 7002 NAMD Benchmarks

We wanted to show more SKUs here. First off, on these CPUs this is highly unoptimized. We are not using AVX2 and AVX-512 and we are using gcc. At the same time, it is important. Software runs in data centers for years. Not everything uses the latest instructions, or even can. There are countless legacy applications out there. With unoptimized code, there is an enormous uplift for the newer generation AMD EPYC 7002 series. The AMD EPYC 7402P and EPYC 7401P have the same core count, yet the new chip is about 25% faster.

Intel delivered huge performance gains with this generation as we saw in our Intel Xeon Gold 5218 Benchmarks and Review earlier this week. AMD is delivering enormous gains at the same core count as well. Here, when the software is not well optimized for either modern architecture, having more brute power to handle the workload helps. That is why the Platinum 8280 with its 28 high-speed cores is faster than the AMD EPYC 7402P, and why the AMD EPYC 7702P we call a 2-socket replacement part. It is simply that much faster.

Next, we are going to continue with our benchmarks starting with OpenSSL.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

AMD EPYC 7002 OpenSSL Sign Benchmarks
AMD EPYC 7002 OpenSSL Sign Benchmarks

Here are the verify results:

AMD EPYC 7002 OpenSSL Verify Benchmarks
AMD EPYC 7002 OpenSSL Verify Benchmarks

OpenSSL is one of the most used functions in modern architectures. If you are reading this, or virtually anything else online about the AMD EPYC 7002 series, you are doing so over an HTTPS connection using OpenSSL.

The top of the chart is fairly self-explanatory. AMD has more cores, and that leads to better performance. Dual Intel Xeon Platinum 8280‘s perform slightly better than the AMD EPYC 7702P. The AMD EPYC 7402P is slightly faster than the Intel Xeon Platinum 8260 but they are fairly close. The Intel Xeon Platinum 8260 has a single-socket counterpart, the Intel Xeon Gold 6210U with a $1500 list price. Intel has higher clock speeds while AMD has more cores, cache, and memory bandwidth. The cost of the two chips is very similar.

One other quick note here on power consumption. The dual Intel Xeon Platinum 8280 configuration was using more than 230W more than the AMD EPYC 7702P server, and about 200W more than the single AMD EPYC 7742 server here. That can be partially attributed to using four additional DIMMs. We have heard cloud vendors use $6 per watt as their 1W TCO for gear. Also, the single socket AMD EPYC server itself we expect to see sold for several hundred less than the dual-socket Intel server. We think Intel is still very competitive here with Xeon Platinum 8280 street pricing in the $1000-1200 range.

A Quick Note on OpenSSL and Intel QAT Accelerators

Being fair here, Intel has its QuickAssist technology which can accelerate OpenSSL. You can see our Intel QuickAssist Technology and OpenSSL Benchmarks and Setup Tips piece as well as our Intel QuickAssist at 40GbE Speeds: IPsec VPN Testing to see the impact. Intel has Lewisburg PCH options with built-in QAT. Instead of making this a universal accelerator for Xeon, Intel’s decision to put the functionality only into higher-spec PCH’s has thwarted mainstream server adoption. Intel will point out that QAT is widely adopted in the telecom space where Xeon D and Atom chips can have this built-in.

Intel Atom C3xxx QAT Device
Intel Atom C3xxx QAT Device

On the other hand, the Dell PowerEdge, HPE ProLiant, Supermicro Ultra, Lenovo ThinkSystem, and Inspur N-series Xeon Scalable servers we have in the lab all use lower-cost PCH options without QAT. Server vendors that include QAT enabled PCH’s tend to also route extra PCIe lanes from the CPU to the PCH which has a similar impact to using an add-in accelerator.

Intel can do better here with QAT, however, one needs to integrate a QAT accelerator into their stack. One also needs to ensure their stack supports the correct QAT version. Although we have had working QAT stacks, it is far from a universal plug-and-play experience.

Next, we are going to continue with more benchmarks.

57 COMMENTS

  1. Absolutely amazing. I still can’t believe the comeback AMD has made in just a few years. From a joke to toppling over the competitor for the top position in what, 3 odd years?

    Definitely going to get this for our next server build. Major props to AMD.

  2. “Intel does not have a competitive product on tap until 2020.”
    Cooper Lake is not remote competitive with Rome, much less it’s actual 2020 competitor Milan.

    Highly unlikely Intel will be close to competitive until it’s Zen equivalent architecture on it’s 7nm node.

  3. Wow! I’ve been holding out upgrading my E5 v3-generation server, workstations, and render farms in my post-production studio because what has been available as upgrades seemed so incremental, it was udnerwhelming. And now here comes Rome and the top SKU is performing 5-6X faster than an E5-2697 v3! Maybe a weird comparison, but specific to me. I’m thinking back to some painfully long renders on recent jobs and imagining those done 5x faster…

    I would really, really love to see some V-Ray or even Cinebench benchmarks. I know I’m not the target market, but I’m not alone in wanting this for media & entertainment rendering and workstation use. Any chance you could run some for us?

    Also, what Rome chip would you need for a 24x NVMe server to make sure the CPU isn’t the bottleneck?

    Great work, as always. Thank you!

  4. Intel’s got Ice Lake too. I’d also wager that Patrick and STH know more about Intel’s roadmap than most.

    Ya’ll did a great job. Using CPU 2017 base instead of peak was good. I thought it was shady of AMD to use peak in their presentations.

    I’d like to see sysbench come back.

  5. Most OEMs will have no problems with moving to Rome but Apple is in a tough situation with their Intel partnership, aren’t they? How can they market Xeon generational improvements when others are will be talking about multiplying performance and a substantial relative price decrease?

  6. Take a look at the top of dual socket systems in the SPECrate2017_int_base benchmark here:
    Supermicro already posted a 655 base with 7742’s to top the charts.

  7. Wizard W0wy – we applied patches, however:
    1. We left Hyper-threading on. I know some have a harder-line stance on if they consider HT on a fully-mitigated setup.
    2. We did not patch for SWAPGSAttack. AMD says they are already patched or not vulnerable here. Realistically, SWAPGSAttack came out the day before our review and there was no way to re-run everything in a day.

    Tyler Hawes – we have the Gigabyte R272-Z32 shown on the topology page. That will handle 24x U.2 NVMe but that will be a common 2U form factor in this generation. CPU selection will depend on NIC used, software stack, and etc., but that is a good place to investigate.

  8. Awesome article STH

    I would love to see some more latency test, Naples had some issues with latency sensitive workloads in part due to the chiplet design. So, will you guys test it out in the future?

    And more database tests?

  9. You did mention you would talk more about 3rd Gen EPYC? I don’t think I saw it anywhere in the article. Will it be out to compete with Ice Lake? What are the claims so far?

    Thanks for the great article! Best I’ve read so far.

  10. I’m also disappointed in the lack of a second gen 7371 SKU. Our aging HP GL380p G8 MSSQL server is due for a replacement, and I don’t want to have to license any more cores. Per-core performance really shines considering $7k/core. It would feel wrong to deploy without PCIe Gen 4; I might drop a 7371 into one of the new boards (if I can get any vendor support) and swap it when the time comes.

  11. I appreciate the amount of work you have done in compiling all this information. Thank you, and well done.

    Also, well done to AMD! What an amazing product they have delivered. Truly one of the greatest leaps in performance-per-dollar we have seen in recent years.

  12. Hello Patrick,
    There was a Gigabyte converged motherboard layout (H262-Z66) floating out that showed 4 Gen-Z 4C slots coming from the CPU. There were rumors of Gen -Z in Rome going back to the Summer of 2018; Is there anything you can tell us about that?

  13. Hi guys, taking my wife to the hospital in 30 minutes for surgery. Will try to get a few more answered but apologies for the delay later today. She broke her elbow (badly.) Thank you for the kind comments.

    Jesper – it is a bit different in this generation. When you are consolidating multiple sockets, or multiple servers, into a single socket, your latency comparison point becomes different as well. We have data but tried to manage scope for the initial review. We will have more coming.

    Luke – Milan is coming, design complete, 7nm+ and the same socket. AMD said the Rome socket is the Milan socket.

    Billy – I think AMD’s problem is that there is so much demand for their current stack, some of those SKUs did not make the launch. I am strongly implying something here.

    Michael Benjamins – 2P 7742 was 27005 without doing thread pinning. There is a lot more performance there. Also, Microsoft Windows Server 2019 needed a patch (being mainlined now) to get 256 threads to boot. I am not sure if I want to show this before we get a better tuned result. Even with this, R20 hits black screen to fully rendered in ~12 seconds. Cores were under 40-98% load for <10 seconds with R20. I actually think R20 needs a bigger test for a 256 thread system.

  14. I’m not sure I understand the paragraph about Intel putting pressure on OEMs. What exactly should not be named/disclosed? Can someone please explain the meaning to me?

    Sounds like the typical and shady anti competitive measures Intel is known for.

    p.s. I hope this is not a double post, but I got no indication if my previous submit worked or not.

  15. Quick question on the successor to Snowy Owl? Have we got an ETA, or will AMD simply pop Ryzen in its place, like ASRock have done?

  16. This is f@#$ing great work. You’ve covered high-level, deep technical, business and market impact, with numbers and practical examples like your load gen servers that are great. I’ve read a few of the other big sites but you’re now on a different level.

  17. To anyone that’s new I’ll reiterate what I said on the jellyfish-fryer article

    Patrick’s the Server Jesus these days.

    He’s done all the server releases and they’re reviewing all the servers

  18. Okay. My criticism was this looked really long. I started reading yesterday. Finished today. Why’d AMD have to launch so late????

    After I was done reading I was totally onboard with your format. You’ve got a lot of context interjected. I’d say this isn’t as sterile as a white paper, but it’s ultra valuable.

    Now get to your reviews on CPUs and servers.

  19. @Youri and another Epyc system from Gigabyte already beat the SuperMicro one at your link 😉

    R282-Z90 (AMD EPYC 7742, 2.25GHz)

  20. I’m thinking you should submit this to some third tier school and call it a doctoral thesis for a PhD. That was a dense long read. I’ve been reading STH since Haswell and I’ll say that I really like how you’ve moved away from ultra clinical to giving more anecdotes. I can tell the difference reading STH over other pubs. This is deep and thorough.

  21. What vendor can accept the first orders for the systems with AMD EPYC 7002 (configurator ready) and is able to ship let’s say within next 2-3 weeks?

  22. I am so glad I waited until today to read this, when I could sit down and read at my leisure. Thank you Patrick and team. This is why I read STH.

  23. “2. Customers to change behavior”

    This is likely not what AMD can do since there is no medicine or medical operation available to fix stupidity!

    Stupidity can’t be fixed by others except people themselves!

  24. Mike Palantir,
    During the event, I thought I recalled the HP rep stating they had systems available for order today.

  25. FYI Rumour rag, WCCF claimed to Fact check your statistic’s!

    “Warning: some of the numbers below are simply absurd.

    ServeTheHome reviewed the top-end 64 core dual socket and found that “AMD now has a massive power consumption per core or performance advantage over Intel Xeon, to the tune of 2x or more in many cases.”

    The new EPYC parts have a massive I/O advantage with 300% the memory capacity versus Xeon 33% more memory channels (8 versus 6) and finally 233% more PCIe Gen3 lanes. But what about actual performance?”

  26. This is probably a dumb question but are there any vendors that will be selling individual chips (not systems) within the next quarter or two? And who would the best vendor be?

    Thanks

  27. guys.. remember that both AMD and us as customers do owe TSMC a lot. Without TSMC all this would probably be not possible today.

  28. Never mind my previous comment. Newegg is selling the processors and is already on back order to the end of August for most of the desirable SKU’s.

  29. Patrick thank you for the informative article and all the great work you and your team do. Also would like to thank the STH readers for their article comments and posts in the forums. This is one of very few sites where I actually enjoy reading what other people think and say…

    And thank you for the nudge nudge wink wink information with regards to the 7371 style skus. I have a application that processes in a very serial fashion and it benefits from higher megahertz vs Core quantity, though 16 cores is perfect for the SQL and other tasks on the machine. I’m excited about the new NUMA architecture and I’m looking forward to whatever is next.

    Best wishes and a speedy recovery to your wife!

  30. @Billy
    Epyc 7542 would probably match or beat the 7371 in mosts lightly threaded tasks.
    @lejeczek
    What can TSMC make that Samsung couldn’t?

  31. Amazing writeup Patrick, once again! Beamr is proud to be a Day 1 application partner as the only company focused on video encoding. As a result of this amazing achievement by AMD, on the Gen 2 EPYC we were demonstrating at the launch event 8Kp60 HDR live HEVC video encoding on a single socket of a 7742.

    And as a result of having 64 high performance cores, because we are heavily optimized for parallel operation, all cores were utilized at 95% or above! Beamr is super excited to have this level of performance available to our first tier OTT streaming customers and large pay TV operators.

    AMD has broken through on so many levels with this new processor generation that I understand why you feel the need to even go deeper with your analysis and review after writing an “epic” 11k word article.

  32. Great look at the next big thing… After it all, I can only ask if with FINALLY a 1 node socket is there any talk of 4P or 8P…
    The thought of 512C\1024T in a 4U is like dreams come true… And if the rumors of SMT4 turn out to be true (EUV does give 20% more density and power-savings) 512C/2048T could do most heavy jobs in one box…
    And it does change the landscape since the progression from 8C to 64C covers basically 100% of the market.. The market doesn’t care if they need 1P or 8P, they only care about the areas where AMD is excelling…
    Another interesting area I’m not seeing a lot of is Edge Computing… This should seal the deal with an 8 or 16C that can have 6 NICs and an Instinct for AI inferencing…

    Love the site… Looking at bare metal in the future…

  33. So what they’ve figured out that other sites haven’t yet, is the whole consolidation story. That 4 Xeon E5-2630 V4 to 1 epyc really resonates.

  34. It will be interesting to see how long it is before VMware and other companies start adjusting their licensing to reflect future market trends. Software companies have investors to please too. If Intel doesn’t have anything to compete by the time prices start going up then it could cause a huge wave of companies switching to AMD for the simple fact that their licensing would be too expensive otherwise. The other thing they could do is switch. Everything to per core licensing which would give Intel a slight advantage or possibly just a tie once you factor in the total cost. I bet you big changes are coming though. No company could survive having their revenue cut to 1/6 its original value in a couple of years.

  35. So this is me just thinking about this some more. It will also be interesting to see the impact this could have on interest in open source alternatives. Costs jumping 2-6x are the kind of events that get people to start looking into alternatives.

  36. Colby, vmware changing its licensing to per core after appearing and praising rome on stage together with amd, would be one of the top3 stupidest move this industry has ever seen. Not impossible, but highly improbable.

  37. Yeah but in my experience when it comes to looking like an idiot and having to explain to your investors and wall street analysts why your revenue stream has been cut in half most CEOs would prefer to look like an idiot. After all the CEO owns a good portion of the company as well. I don’t necessarily think it will be all at once but instead of a 3% annual increase we may start seeing 10-15%. They also may be hoping that due to the cost reduction allowed in Rome that they will see more customers coming in looking to virtualize since it will be cheaper. Another thing that could potentially go VMware’s way would be if customers just started giving more resources to each vm since they aren’t as constrained by their licensing anymore. Instead of dual core vms with 4GB of ram now everyone gets

  38. …everyone gets 4 cores and 8GB with the benefit to the company being added productivity. Nothing happens in a vacuum in business but the question is what factors are going to prevail the most.

  39. Just joining in for the thanks. The most thorough and in-depth review on the net I’ve found so far.

    Also, Patrick, I wish your wife quick and full recovery. So you can get back to benchmarking, that is 😉

  40. I’m surprised at how inexpensive the lower core count 1P processors are. Are these practical in a high end CFD workstation or for other compute intensive workstations ?

    Someone needs to compare the Ryzen 9 3950X ($750) with the soon to be released 16 core Zen2 with the 7302P ($825). Can’t believe a 16 bit Rome EPYC is only $75 more than the R9 ! The 16 core Zen 2 has to be priced between these 2 devices, maybe $800 ?

    With the 7502P (32 cores) selling for $2300, I guess we know the upper end of the price on the Zen2 32 bit Threadripper.

    Another thing to keep in mind is that Zen3 products will be shipping in 15 months or so. They will surely push down the price/performance curve even further. Zen 3 will be 7nm EUV, which should be 20% higher density, lower power consumption and faster clock speeds. Zen 3 Ryzen should be 32 core, TR should be 128 core, EPYC should be 128 or even 256 core !

  41. @Nobody I’m also really curious about the suitability of these chips for a workstation and how they compare to threadripper. Patrick thought the clock speeds on gen 1 EPYC chips were too slow before the 7371 was released.

  42. Devastating. Adding the fact that second generation is compatible to SP3 and vendors have v2-enabled BIOSes out there already is a serious hit. Good job, AMD

LEAVE A REPLY

Please enter your comment!
Please enter your name here