Supermicro A2SDi-TP8F Performance
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
We are going to show off a few results, and highlight a number of interesting data points in this article.
As a quick note, we did our Intel Atom C3858 benchmark and review piece on this platform so you can see more on performance, features, and competitive analysis there.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
Aboard the A2SDi-TP8F, the Atom C3858 utilizes higher numbers of lower frequency cores to achieve performance levels at lower clock speed. These 12 cores are not equivalent to those found in full Xeon Scalable cores, but they do not need to be.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 4K results which work well at this end of the performance spectrum.
In this benchmark, we can see nice scaling from the Atom C3758 (8-core) part to this Atom C3858 (12-core), then to the Atom C3958 (16-core) parts.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Compression is a common workload for platforms in this segment. Here, we see significant performance gains over the Intel Atom C3758. These chips have the same TDP, but one can see the Atom C3858 pulls ahead due to having 50% more cores albeit at lower clock speeds.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
OpenSSL is another common workload in these platforms. Our results are focused on using the standard CPU features, not Intel QuickAssist technology. Intel QAT is an accelerator for compression and encryption that is now several generations old. More programs are now supporting QAT but it still requires explicitly declaring support rather than being a built-in feature. In some OSes, it can be difficult to even get QAT running to accelerate features like ZFS compression.
Intel needs to do a better job expanding the QAT reach to more of its SKUs so we can finally get encryption for “free” on its CPUs. If you have purpose-built hardware then this provides a lot of potential as we showed in our Intel QuickAssist at 40GbE Speeds: IPsec VPN Testing and Intel QuickAssist Technology and OpenSSL Benchmarks and Setup Tips pieces.
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
On the chess benchmarking side, the Atom C3858 performs well, and in the 4-8 core range of more robust microarchitectures. Here, features such as the lack of bmi2 instruction support can make for lower performance than we see on higher-end chips. These small instruction changes are important since they do differentiate Atom cores. As other examples, while one gets AVX-512 on newer generation Xeon D-2100 parts and Xeon Scalable parts, those specialized instructions are not available on the Atom parts.
Supermicro A2SDi-TP8F Power Consumption
We used our pair of Extech TrueRMS Power Analyzer 380803 units to take measurements at different points of the A2SDi-TP8F use on 120V power in the embedded lab. Embedded platforms tend to spend more time at the edge in offices rather than in higher power data centers, hence why we do our testing at a lower voltage. Here are the figures:
- Power off BMC only: 4.4W
- OS Idle: 22.3W
- 100% Load: 40.9W
- Maximum Observed: 44.3W
These are solid results. The A2SDi-TP8F performs well and at a lower power level than the 16-core SKUs. These are systems one could potentially fit two systems in a 1U 1A @ 120V rack power budget.
Next, we are going to look at the block diagram and topology, and power consumption before getting to our final words.