Tyan Transport CX GC68B8036-LE Performance
At STH, we have an extensive set of performance data from every major server CPU release. Running through our standard test suite generated over 1000 data points for each set of CPUs. We are cherry-picking a few to give some sense of CPU scaling.
We are also going to do something a bit different with this review. We are going to discuss the consolidation of a 4-year old 2U 4-node system to a single-socket server like this one.
This is our lab’s 2U 4-node dual Intel Xeon E5-2630 V4 system. We actually have three of these systems two with E5-2630 V4’s and one with E5-2628L V4’s. These are now 4-year old systems and so directly in the middle of the 3-5 year replacement lifecycle. The E5-2630 V4 was also a popular SKU. In the Xeon Scalable range, this roughly became the Xeon Silver 4114, albeit the Xeon Silver 4114 is faster. As a result, we wanted to look at a bit of data, where it made sense, to see if we could consolidate two or more of these systems into a single-socket cost-optimized solution.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.
Here we can see some nice scaling. Just to be clear, what we did was we ran the same compile job on all four Xeon E5 V4 nodes and then on the AMD EPYC systems. We are expressing the job in terms of compiles per hour so we can compare performance. It also means we get effectively linear scaling for the dual Xeon nodes.
Here we can see the impact of the new EPYC Milan chips versus the older Rome generation. While the older generation was close, we hit a tipping point with the newer generation. We were able to test using both Rome BIOS as well as updated Milan BIOS on the Transport CX because we had to do the flash for Milan compatibility.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our new Linux-Bench2 8K render to show differences.
Here we were able to sneak some runs onto the 2U 4N system with the 12-core Xeon E5-2628L V4. This type of benchmark, analogous to Cinebench on the Windows/ desktop side scales very well with core counts, but also tends to favor AMD architectures (as does Cinebench) so we thought it would be more interesting to show the higher core count results. This is 32/ 64 AMD cores versus 96 Xeon E5 V4 cores.
Something we will note is that we are starting to hit some scaling limits of the 8K test at the higher-end of the dual-socket range. When the Xeon E5 V4’s were out, the 4K test was still relevant.
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
Although we are focusing on the lower-end CPUs, we also wanted to give a sense of what higher-end configurations may look like, from generations that are in that 3-5 year refresh cycle. Here we can see the top-end dual Intel Xeon E5-2699 V4, dual-socket configuration with 22 cores per CPU and an over $4000 list price at the time, effectively consolidated down to a single $2730 CPU option in this Tyan system. Even in the value segment with 2nd Gen Intel Xeon Scalable Refresh SKUs at around the same price as two Xeon Gold 5218R processors, we get more performance with fewer cores from the EPYC 7543 / 7543P. Note, we have tested previously that the P and non-P parts perform basically identically, but one would likely use the EPYC 7543P in a system like this from a cost perspective.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
Since this is a virtualization workload, it is easy to scale out to multiple nodes. Here, we can see perhaps the most interesting result. When we have smaller VM sizes that fit into the lower core/ memory count per CPU the 4x dual-socket solution works very well. As we get to larger VM sizes, we start crossing boundaries where those crossings cause SLA misses.
Perhaps the other interesting takeaway was on the “H” test, in the ten runs we had four that could complete at 8 VMs, but six where we had 7 VMs. This was very close and may be an impact of the 2U 4-node since we test servers under fully heat-soaked conditions in the lab, with nodes above and below to simulate real deployments. We are using the more conservative number here since that is what we had on the majority of the results, but this is a close enough one where it is debatable. Still, the impact of having a single socket versus having resources spread across four nodes and eight socket s is very visible here.
Lower-end CPU Options
We recognize that some may be looking for lower-end CPU options. We had a limited amount of time to do this review, but the story here is very interesting. Instead of having low-end EPYC 7003 parts, AMD is retaining low-end AMD EPYC 7002 CPUs such as the AMD EPYC 7282. These can offer lower costs, but also lower performance. That is partly due to the fact that these are 4-channel memory-optimized parts. We did a piece on the AMD EPYC 7002 Rome CPUs with Half Memory Bandwidth.
Storage tends to be an application where chips such as these are very popular since one can use four DIMM sticks to lower costs. We can see that in a lower-cost deployment than we are focusing on with density that one may use one of these CPUs to get basic platform connectivity. While we did not get to test them, we have tested these CPUs in the Tyan Transport SX TS65A-B8036 2U 28-bay server that uses a similar Tyan S8036 motherboard.
Next, we are going to move to our power consumption, server spider, and final words.