Cavium ThunderX2 OpenSSL Performance
Switching to our more standard test suite, we are going to show results using gcc. We are using OpenSSL 1.1.0g which is the standard for Ubuntu 18.04 LTS at its release and showing RSA2048, RSA3072, and RSA4096 results for a wider array of CPUs.
We added 24-core AMD EPYC 7451, the Intel Xeon E5-2699 V4 and the Intel Xeon Gold 6152 to these charts to expand the base a bit more.
On the verify side, the Cavium ThunderX2 leapfrogs AMD’s top bin 24-core EPYC 7451 in terms of performance. There are algorithms that Cavium ThunderX2 is significantly better at, but we wanted to show that even in an area where they are less well suited, the Cavium ThunderX2 still performs in the ballpark of the higher-end Intel and AMD offerings. The other aspect to keep in mind here is that the AMD EPYC 7451 is the second least expensive current generation chip on these charts and is still around 50% more expensive than the ThunderX2 CN9980.
In summary, this is not a good workload for Cavium, but it still is competitive in nominal performance while winning on a price/ performance basis.
Cavium ThunderX2 Family c-ray 8K Performance
We had the opportunity to change our 32 core chip to other core counts in order to emulate different SKUs. We decided to emulate the 30 core / 120 thread ThunderX2 CN9978, the 28 core / 112 thread ThunderX2 CN9975 and the 24 core / 96 thread ThunderX2 CN9975. Cavium gave us the capability and said that performance should be correct albeit with higher power consumption. We did not have the time to run everything across the SKU stack for launch but we wanted to give some relative idea of performance between the SKUs.
This is an updated c-ray benchmark we have been using for years now to simulate an 8K render. It is highly dependent on high core/ thread counts and cache speeds. AMD EPYC has ruled this benchmark since its introduction.
Overall, the Cavium ThunderX2 showed great scaling from 96 to 256 threads just about how we would expect to see in this test.
Cavium ThunderX2 Compression Performance
Another benchmark we have used for years is 7zip for compression/ decompression performance. We wanted to show a result, and then show why Cavium ThunderX2 is different.
That decompression speed absolutely crushes AMD EPYC 7000 and Intel Xeon Scalable numbers. We even added the Intel Xeon Platinum 8180 into the mix just to show some magnitude regarding just how big that number is. We also added a few quad Intel Xeon E7 numbers into the mix to give a sense of scale. The performance of Cavium ThunderX2 can be very competitive with x86 offerings.
This result brings up a great point on Cavium ThunderX2. 256 threads is a lot, especially with 64MB of cache. Just like we see performance oddities on four-socket systems that reach over 200 threads on the x86 side, sometimes you run into situations where the software simply cannot keep up with utilizing that many threads. We ran the test on an extra, not counted, run and just watched htop. This is what we saw:
As our standard, we run this with SMT set to a maximum. At the same time, this is a case where all of the cores are not being utilized evenly and that is hurting performance.
Cavium ThunderX2 UnixBench Performance
UnixBench is an old benchmarking suite that frankly is a bit too old. However, it is interesting from a few points of view. First, we get a lot of readers who request the whetstone/ dhrystone 2 results. Second, it is a suite conceived before Arm architectures were a major force in servers. Although we have been highlighting multi-core performance, we are going to use this as a way to highlight single core performance as well.
Looking at the dhrystone 2 results, the multi-threaded results are competitive. AMD does well here since it has 32 threads per socket as well but you can see the dual Cavium ThunderX2 setup is just between the dual AMD EPYC 7601 and the dual Intel Xeon Gold 6148. We wanted to showcase single threaded results and so here is what that looks like:
One can see the impact of the higher single core turbo boosts that Intel offers. The Cavium ThunderX2, much like many Arm designs we have seen, has competitive single thread performance but relies upon using many cores to hit upper echelons of performance.
Moving to the whetstone side, here is what the results look like for multi-threaded:
That is a 256 thread dual Cavium ThunderX2 chip hitting top marks in the multi-threaded results.
Here are the single thread results:
Again, we see the strength of the ThunderX2 architecture is running a massive number of parallel threads.
While this may be an antiquated workload, using gcc-7 the ThunderX2 CN9980 performs extremely well on our multi-threaded tests.