Bonus Workloads with the AMD EPYC 7371
As the picture became clear that this was a hotrod CPU, we wanted to validate our assumptions a little more thoroughly before making our conclusion that the AMD EPYC 7371 is now the CPU to get. We wanted to show three different views of the gauntlet we put the chips through.
STH Hosting Analysis
One area that immediately piqued our interest was web hosting. At STH, we use both Intel Xeon E5 and Xeon Scalable CPUs, as well as AMD EPYC. Specifically, for our production cluster, we use AMD EPYC 7351P chips for our single-socket servers and AMD EPYC 7551 chips in our dual-socket servers. Let us be clear, everything runs absolutely fine on our current infrastructure. If we did not review these chips and believe in being willing to use what we recommend, we would not even think about upgrading what we have. At the same time, we have been working on introducing a containerized version of our production infrastructure in the lab for our 2019 test suite. This is a work in progress, and it is very different from a real web hosting scenario since we have 100GbE connectivity from the systems under test to our load generation nodes and everything is being served from a single machine. This is the reason we added the Intel Optane 480GB SSDs into the test beds, just to give us something comparable for our own purposes.
That is fairly impressive, and a large gap upgrade over both the AMD EPYC 7351(P) and the Intel Xeon Gold 6130. There are certainly software tuning opportunities, but this is a real-world example for us using an extremely lightly modified version of STH main site. What this tells us is that we can deploy AMD EPYC 7371 and see direct benefits to our hosting infrastructure by simply replacing chips.
AMD EPYC 7371 Under Duress
We spent a lot of time in the lab watching the AMD EPYC 7371 and trying to push it to lower clock speeds. As some context, the Intel Xeon Silver 4116 is the top end Xeon Silver SKU for Skylake-SP and it tops out at 3.0GHz. The Intel Xeon Gold 5100 range, outside of the quad-core Gold 5122, tops out at 3.2GHz maximum turbo frequency. We wanted to see if we could push the AMD EPYC 7371 to go below its 3.1GHz base clock since that is so fast it is about the speed of the turbo clocks of the Intel Xeon Silver and Gold 5100 ranges.
What we did was we took several of our benchmarks a step further and put the AMD EPYC 7371 “under duress.” Side-by-side we set up the AMD EPYC 7371 system with an AMD EPYC 7351 server. We launched docker containers pinned to each NUMA node on the CPUs to do Monero mining. These containers use around 2MB per thread, the same amount of L3 cache that each AMD EPYC 73×1 CPU has with 16MB L3 cache and 8 threads per die or 64MB total with 32 threads. This is a really nice way on AMD EPYC 7001 series chips to heavily stress the cores and caches. We also launched a netdata container on each.
We could watch the 100% CPU loading and pin frequencies to 3.1GHz and keep the chips out of their turbo frequencies. You can see that the AMD EPYC 7371 generally has a 20% performance advantage in this crypto workload, but we wanted to let the systems thrash.
c-ray Under Duress
We went one step further. We wanted to see what would happen to c-ray when we started to thrash the processors. In the context of a CPU that is getting thrashed, we re-ran side-by-side a c-ray container while the cryptonight activity and the monitoring container were still running.
Side-by-side, one can see the impact of higher clock speeds. We have a 19-24% improvement in performance.
Linux Kernel Compile Under Duress
Whereas the c-ray benchmark is highly constrained to caches and scales extremely well with cores, compiling the Linux kernel uses more of the system and has some parts that are not as well threaded. Here is an example of CPU utilization, by thread, you may see during the compile.
We also used this to see the difference between the AMD EPYC 7371 and AMD EPYC 7351 under duress. Instead of running at maximum boost during the less threaded parts, the CPUs are pushed to their base frequencies during the test.
As you can see, the AMD EPYC 7371 is completing the task around 19.5% faster than the AMD EPYC 7351. Or said another way, the AMD EPYC 7351 takes around 24% more time to complete the task. This is expected given the similar architectures yet higher clock speeds for the AMD EPYC 7371.
That is an absolutely enormous gain within the same generation. During the Intel Xeon E5 era, we sometimes saw 100MHz increments for “refresh” mid-cycle chips that would add 3-5% more performance in a line. AMD is giving us 20%.
A Word on SPECrate2017_int_base AMD EPYC 7371
We seldom publish the results from our SPEC CPU2017 testing. Server vendors and CPU vendors have teams that optimize their platforms using AOCC, icc, and other compilers for their official runs. We understand this and typically we use gcc which provides a lower result that companies do not want to be found in searches and compared to official results of their competitors.
On the other hand, teams that buy 16 core Windows Servers often use SPECrate2017_int_base as a metric for RFP criteria. We are going to address this by simply giving a percentage delta. The AMD EPYC 7371 was about 19% faster in our gcc SPECrate2017_int_base runs over the AMD EPYC 7351. Putting that number into context, that is in-line with our other results so we feel that when official AMD EPYC 7371 numbers are published, they will see a significant speedup versus the AMD EPYC 7351.
Given what we have seen with our gcc numbers, and how they correlate to vendors’ official runs, we expect official SPECrate2017_int_base AMD EPYC 7371 figures will be in the 96 +/- 8% per socket range. That will put it above the Intel Xeon Gold 6142 and Gold 6142M (generally 85-90 per CPU) competition as the fastest 16 core CPUs on the market by a safe margin.
Again, if you are issuing an RFP, you should wait for official numbers from systems vendors, but in Q1 2019 as vendors release their official benchmark figures for the AMD EPYC 7371, the 16 core hierarchy will change with AMD atop of all of today’s x86 16-core per socket offerings.
Next, we are going to discuss power consumption before giving some of our market positioning thoughts and concluding our review.