UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results:
The top end of this chart we have covered a lot. Instead, the ThunderX2 32-core CN9980 part shows the Arm quandary here. If AMD cannot get a lot of market movement with this much better price/ performance and power, then how will Arm get x86 customers to switch ISA? Switching from Intel Xeon to AMD requires essentially just powering up the machines and deploying VMs and containers to the AMD nodes. Switching to Arm can mean building new VM and container images, and porting software. If Arm is ahead, that makes a lot of sense. For now, ThunderX2 even at the same thread count, is well behind AMD in performance and power consumption.
The other item we are watching is the Intel Xeon Platinum 8260 (and the similar Gold 6210U by extension) and the AMD EPYC 7402P. If we assume the Xeon Gold 6210U is similar performance to the Platinum 8260, and that the server buyer is not interested in additional memory capacity/ PCIe expansion capacity of AMD, then the Intel Xeon Gold 6210U may be extremely competitive at this price point. While AMD and its partners are pushing the single socket story, Intel is not which means we do not see the Xeon Gold 6210U often, but we need to mention it.
GROMACS STH Small AVX2/ AVX-512 Enabled
We have a small GROMACS molecule simulation we previewed in the first AMD EPYC 7601 Linux benchmarks piece. In Linux-Bench2 we are using a “small” test for single and dual-socket capable machines. Our GROMACS test will use the AVX-512 and AVX2 extensions if available.
We have an updated version of GROMACS we are still running regressions on. Zen 2 has a newer AVX2 path that this benchmark is not fully taking advantage of. At first, the plan was to take it out of the result set. We do not like that inconsistency. There is a lot of value showing a legacy application performance without using all of the new architectural improvements. GROMACS is the type of HPC application that will be optimized for Zen 2 and AMD EPYC 7002 CPUs, but we still thought this result was interesting.
Here, the AMD EPYC 7002 series is getting two primary performance bumps. First, it is seeing some benefit from the newer architectural improvements. You can see that with the dual EPYC 7601 to EPYC 7502 numbers. The newer AMD EPYC 7502 has more performance at the same core count and a lower cost. Second, one can see the impact of having 64 core CPUs.
TDP does not equal power consumption. The Intel Xeon Platinum 8280 system was using 40% more power than the AMD EPYC 7742 system here. The Intel Xeon Scalable family is well known for pushing higher power consumption for AVX-512 heavy workloads.
Even with AVX-512 and better optimizations, the Intel Xeon chips are about on par with their AMD counterparts, yet use more power to deliver similar performance. We look at this as more of a worst-case scenario for EPYC 7002 and it is still competitive.
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
Here, the new AMD EPYC 7702 series is performing very well. The dual Intel Xeon Platinum 8280 / 8180 configurations are coming in between the single AMD EPYC 7702P and EPYC 7742 results.
The AMD EPYC 7402P is showing a massive performance uplift over the EPYC 7401P. In that segment of the market, Intel provided 20-40% performance uplift from 2017 to 2019 and AMD is doing the same. Intel is getting their performance increases due to higher clock speeds and more cores. AMD is using a similar core count but with architectural improvements and higher clock speeds.
The first workload we wanted to look at is SPECrate2017_int_base performance. Specifically, we wanted to show the difference between what we get with Intel Xeon icc and AMD EPYC AOCC results. We expect server vendors get better results than we do, but this gives you an idea of where we are at:
As you can see, the overall performance of the solution is excellent. These numbers are important. They are often used as performance standards in RFPs for server buys. Here, AMD simply obliterates its top-end competition even using pre-production servers.
We also wanted to note that results for the Intel Xeon Platinum 8280 are a bit below what large vendors who have teams dedicated to producing these benchmarks can get (e.g. Cisco 1P and Lenovo 2P Platinum 8280.) We wanted to show our numbers since they tend to be a bit lower than what is on the official SPEC website. Vendors do more tuning and have more resource than we do. For your RFP’s, please use official numbers. We are in the ballpark of what these vendors get, within a few percent, but we are a bit lower. Even our AMD EPYC 7002 numbers are at the lower end of the range we would expect from vendor benchmarks.
Still, these are absolutely monster results. Just for comparison, a dual Intel Xeon E5-2650 V4 is around 105 here. Even in the midrange of a 3-year-old server stack, we are seeing a 6:1 or more socket consolidation ratio from AMD. If you are using VMware per-socket licensing, this is at the point where you can see day 1 cost savings by removing a 3-year old server cluster and continue to save more over time. We have almost never seen that in the industry.
Compared to the IBM Power9, things get interesting. While we still maintain that Power9 is in a different segment, here a quad-socket Power9 40 core, 8 threads per core machine is hitting just shy of 400. That is not too far from the dual 32-core AMD EPYC 7502.
This was mostly an exercise to see if AMD was going to be able to compete on the common RFP criteria of SPECrate2017_int_base which many enterprise users utilize. Our sense, given what we are seeing, is yes. Again, wait for vendor published benchmarks on production firmware for your RFP.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
This is a bit of an eye chart. As we would expect, the dual AMD EPYC 7742 platform performed well. We also saw uplift on the dual AMD EPYC 7502 solution which has more raw memory bandwidth than the single socket AMD EPYC 77042 single socket solution. Here, the combination of clock speed and more DDR4 channels is leading to a better virtualization solution.
AMD simply has an awesome platform for virtualization as one can more effectively utilize bigger pools of RAM and cores per NUMA node. We think this is going to be an extremely strong sales point for the AMD EPYC 7002 series.
The company also has a CPU-light back-end workload that is mostly dependent on Redis performance and memory capacity with less of a CPU stressor. Our STH KVM STFB Workload 2 essentially showed at a given DIMM size, AMD EPYC 7002 would perform better than Intel Xeon Scalable. With Intel Optane DCPMM, the results favored Intel. AMD does not have a direct Optane DCPMM competitor at this point and we did not have enough 64GB DDR4-3200 DIMMs to get into higher capacity testing. We are going to re-visit these once the new DIMMs arrive.
Next, we are going to look at the network performance using PCIe Gen4 as well as power consumption.