Pivoting slightly from our focus on high-end, and high-power server CPUs, we have the Intel Atom C3958 performance benchmarks under Linux. We have already published benchmarks on the Intel Atom C3338, C3558 and C3955 which are instructive for other points of reference within the Intel Atom C3000 series. While the Intel Atom C3958 does not have the clock speeds to match the Intel Atom C3955 series, it still has 16 cores. What it can claim is that it is the highest-bin QuickAssist part in the current Intel Atom C3000 series lineup.
A Quick Word on Intel QuickAssist
In 2016 we published a few articles around using QuickAssist with OpenSSL and for 40GbE VPN acceleration. In the meantime, Intel has now launched a 100Gbps QAT version and has built QAT into the Burgeoning Intel Xeon SP Lewisburg PCH Options. We will have some cool QAT results soon. For now a few notes:
- Intel Atom C3000 and Intel Atom C2000 QAT have different features and therefore do not use the same driver version.
- You do need Intel Atom C3xxx compatible QAT drivers.
- The Intel QAT ecosystem is significantly stronger than it was in 2016. We speculate this is due to carrier network adoption and making the ecosystem more mature.
QuickAssist Technology acceleration still requires some effort as Intel is not adding it into every chip. Until Intel does so, we expect most software to require an additional step (or much more) getting QAT working.
We do have QAT working on the Intel Atom C3958 already and it is enumerated as a different device type than other QAT solutions as can be seen in the screenshot.
The iQAT can be disabled if that is desired. We wish Intel added this to every chip so it became automatic in applications as that would help QAT support considerably.
At the same time, there are only three reasons you would get a C3958 over a C3955: QAT support, extended lifecycle, and if a specific platform you wanted to use did not have a C3955 option. That makes the QAT support a significant piece of the puzzle.
Intel Atom C3958 Key Stats
Key stats for the Intel Atom C3958 series: 16 cores / 16 threads, 2.0GHz. Unlike the C3955, the C3958 does not feature turbo boost so 2.0GHz is also the maximum speed. The CPU features 31W TDP. This CPU also features a full 20x high-speed I/O lanes and has 4x10GbE making it top-bin in terms of features for QuickAssist parts. These chips are not socketed so end customer pricing will include a motherboard at a minimum. The CPU alone has a 1K unit tray price of $449. Virtualization features such as VT-d and SR-IOV are supported on this generation. Here is the ARK page for the CPU.
Also, for our readers who want to see feature flags, here is the Linux lscpu output:
Our test configuration is very similar to what we used for our Intel Atom C2000 series reviews.
- Motherboard: Gigabyte MA10-ST0
- CPU: Intel Atom C3958
- RAM: 4x 16GB DDR4-2400 RDIMMs (Micron)
- SSD: Intel DC S3710 400GB
- Boot device: Intel DC S3700 200GB
We are using the Gigabyte MA10-ST0 for our test platform. This is an absolutely stunning storage server solution with 16x SATA ports and onboard 10Gb SFP+ networking.
The board comes with an onboard 32GB eMMC storage from Kingston. For an embedded system this is an awesome feature. On this platform we expect this eMMC to be used as a boot device rather than a more expensive SATA DOM. The four SFF-8087 ports mean that using a SATA DOM is not easy on the platform in either case, but they provide easy connectivity to storage backplanes.
We will have a full review of the Gigabyte MA10-ST0 soon, but for those wondering, the maximum power consumption with 2x 10Gb SFP+ links (SR optics) and 2x 1GbE links we have seen is around 61W. We will publish formal figures with our platform reviews but this is certainly a solid low-power platform for the performance and connectivity you are getting.
Intel Atom C3958 Benchmarks
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results. We do have a full set of expanded benchmarks from our next-gen test suite (Linux-Bench2) which you may see in other STH reviews that include this chip. The target market of the Intel Atom C3958 is on embedded applications making the original tests more useful. Generally, embedded applications such as storage controllers and networking appliances will not see heavy workloads where AVX2 / AVX-512 will be useful.
From what we saw in the Intel Atom C2000 series, there are only two OSes that matter for these embedded parts: Linux and FreeBSD. OSes like Windows have a negligible market share on these platforms and we would not recommend using an Atom C3000 series as a desktop. There are many offerings in the market more appropriate for that use case.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.
Here we see a solid performance, not quite up to what the Intel Atom C3955 compute-focused part can offer. Keen eyes will place performance around that of a Xeon D 8 core part. The microarchitecture difference is going to highlight some bigger performance differences than we would be otherwise accustomed to in our other tests.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.
Here you can see solid performance due to having more cores and L1 cache. The Intel Xeon E3 line is not really a competitor as it lacks the features of the Atom and has significantly higher power consumption.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
There is a fairly large chasm between the 16 core Atom C3000 series part and the 16 core Xeon D part. This compression is not using QAT offload which we will have more on soon. We also sorted the chart based on compression speed which puts the Intel Atom C3958 between the six and eight core Xeon D low power parts. Decompression sort would have put it between the eight and twelve core Xeon D parts. That is solid performance either way.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.
We had to remove the 2-core Atom CPUs such as the C2358 and D525 from this list as those generations made this chart borderline unreadable. This test tends to favor many cores and have strong scaling based on core counts which is why the C3958 performs so well here.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
We also have the verify results sorted in the same order to make comparison easier.
Here we see the Intel Atom C3958 competitive with the Xeon Silver 4108. The Intel Xeon 4108 is a similar price part for higher power, more expandable Xeon Scalable servers. The other key point to look at here is the generational improvement. The Intel Atom C2758 was the top-end Rangeley generation Intel Atom C2000 series SKU with QuickAssist. Even without leveraging QAT, the top-bin performance has increased 4x on this test. OpenSSL is a key metric for these parts as they are commonly used in network and storage appliances.
UnixBench Dhrystone 2 and Whetstone Benchmarks
One of our longest running tests is the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results.
Haveing a lot of cores makes up for some of the microarchitecture trade-offs made to keep power consumption low. Still, we see some solid performance out of this part.
Gone are the days of the “wimpy” Atom. The Atom C3958 sports a low clock speed (2.0 GHz) and does not have turbo boost, L3 cache, nor higher-end features such as AVX2/ AVX-512 support. Yet with 1MB L2 cache per core, massive IPC improvements, and 16 cores, the Intel Atom C3958 is competitive with the Xeon D and Xeon Bronze/ Silver lines in terms of performance. Although the Xeon lines are better for virtualization and general purpose compute, for most networking and storage appliances this is a very fast chip.
From a competitive side, there is a lot of talk about AMD EPYC in the market. AMD does not yet have a competitive offering in this segment since even the EPYC 7251 is a 120W TDP CPU before adding any other component to the system, or about 2x what we are seeing an entire configured Gigabyte MA10-ST0 test system pull at the outlet. Being fair, AMD Is not targeting this market with EPYC. Likewise, ARM has made lots of noise but the Intel Atom C3958 provides a solid mix of core performance and acceleration for crypto and compression. The Intel Atom C3000 series is certainly enough to hold current ARM offerings at bay for the near-term future.
Looking at the top-end QAT SKU from this generation versus the previous generation (Atom C2758) one can see that the lineup has significantly expanded its market coverage at the top end. Clock speeds are down ~17% but that is the only area where we are seeing specs decline. Core count has doubled from 8 to 16 cores. Cache size and RAM capacity have quadrupled to 16MB and 256GB respectively. Networking is effectively 10x the speed of the previous generation. PCe and SATA have moved up a generation and greatly expanded in numbers. TDP is up 55% to match the massive performance and platform upgrades. At the same time pricing is now much higher up around 116%. Of course, Intel has parts like the Atom C3758 which address a similar market segment to the previous top of the line part, but it shows how Intel is allowing the Intel Atom C3000 line to creep up higher in the performance stack.
Overall, this is an enormous generational upgrade in performance, but we expect the Intel Atom C3958 to be a lower volume part given its hefty price tag. At $449 for the CPU it is competing with the Intel Xeon Silver 4108 and Xeon D lines.
If you want to learn more, we have complete coverage at our Denverton Day Official STH Intel Atom C3000 Launch Coverage Central