Second Generation Intel Xeon Scalable Compute Performance
We are going to get into our full benchmarking rhythm shortly. We already have three 4P configurations tested and eighth 1P and 2P configurations are in their test beds now. We wanted to give some sense of performance using our stable Linux-Bench2 suite where we can tightly control dependencies. In this section, we are going to show some examples of 2nd gen Intel Xeon Scalable performance versus the 1st generation. We are going to add in some competitive benchmarks from other architectured. Our primary focus will be on generational SKU-level changes.
As we discussed earlier, a major focus for Intel was bringing significantly more performance to the heart of the SKU stack where most chips are sold. In this section, you will see the impacts of those SKU stack changes.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
Here we show some fairly solid generation on generation gains at the top end of the SKU stack. We wanted to give this view before adding older Xeon E5 generations and AMD EPYC to the mix.
AMD is still price competitive here, however, the gap is closing. As an example, the Intel Xeon Gold 5220 now offers more performance at a lower price than the AMD EPYC 7451. Intel is using its yields to change the narrative.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
AMD likes to use Cinebench and c-ray as examples of where its architecture shines. It still does so here, although we see the Linux kernel compile benchmark as a much stronger indicator of real-world performance.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
For the remainder of this section, we wanted to focus on two sets of SKUs. The dual Intel Xeon Gold 6230 and Gold 6130 and the Xeon Gold 5220 and 5120. While halo SKUs are great, these are the higher-volume parts in the market, and the ones most buyers will evaluate. Here, one can see what increasing clock speeds and core counts does for performance.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
We see a similar picture for the OpenSSL side. The new chips add cores and perform very well.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results:
The small bars are quite important. Intel increased the Turbo speeds here which means single threaded performance has imporved. We again see the same multi-threaded performance increases that we have seen before.
GROMACS STH Small AVX2/ AVX-512 Enabled
We have a small GROMACS molecule simulation we previewed in the first AMD EPYC 7601 Linux benchmarks piece. In Linux-Bench2 we are using a “small” test for single and dual socket capable machines. Our medium test is more appropriate for higher-end dual and quad socket machines. Our GROMACS test will use the AVX-512 and AVX2 extensions if available.
This is perhaps our most intriguing result. The Intel Xeon Gold 5220 absolutely demolishes our AVX-512 enabled benchmark. On the Intel Xeon Gold 5100 series, including the Gold 5120, Intel limited AVX-512 to single port FMA. That left a large gaping hole between the Gold 5100 and Gold 6100 series. With the new 2nd Generation Intel Xeon Scalable Processors, the Gold 5220 performs so much better that it appears as though these chips have dual port FMA AVX-512 enabled.
Update 2 April 2019: We asked Intel for confirmation. The Intel Xeon 5222, like the Xeon 5122, has dual port AVX-512 active. It seems like our test chips may have had this feature. We are going to get a new retail set and re-test our AVX-512 findings.
Next, we are going to discuss some of the memory changes which will be followed by a discussion on Intel Optane DC Persistent Memory and an example of how that can help performance. We will then discuss security hardening and Intel DL Boost. We will end with a discussion of the new Intel Xeon Platinum 9200 series followed by our final thoughts.