What happens when you are looking for a low power, x86 appliance for storage or networking but do not want to spend a fortune? Intel’s newest weapon in the low power SoC market is the Atom 3000 series codenamed “Denverton.” We published the first benchmarks of Denverton a few weeks ago as well as Intel’s official announcement. The lowest end Atom C3000 series chip we have seen was also the first released in January 2017, the Intel Atom C3338. The Intel Atom C3338 sports a dual-core CPU. Base clock is 1.5GHz with maximum turbo of 2.2GHz. Compared to the Intel Atom C2358 that is a decrease of 200MHz base and an increase of 200MHz turbo clocks which shows the evolution of Intel Turbo boost. The Intel Atom C3338 also sports 4MB L2 cache up from 1MB on the dual-core Atom 2000 series part. If those specs make you dizzy, for less than half the recommended price ($27 v. $60), and a similar TDP (9W v. 7W), Intel is offering higher performance cores and 4x the amount of L2 cache.
We tried to model a realistic configuration for the Intel Atom C3338 chip.
- CPU: Intel Atom C3338
- Memory: 16GB Crucial DDR4-2133 RDIMM
- Motherboard: Supermicro A2SDi-2C-HLN4F
- Chassis: Supermicro SC721TQ-250B
We had a single 16GB RDIMM available so that is what we used. We should note that the Intel Atom C3338 is a single-channel memory controller that supports up to 64GB RAM (2x 32GB DDR4-1866 RDIMMs.) Compare this with the Atom C2358 which supported only DDR3 and unbuffered ECC DIMMs for a maximum of 32GB. Practically, the Atom C2358 had a 16GB total RAM limitation as 16GB low power DDR3 ECC UDIMMs were hard to get and not well supported. We are going to have a full review of the Supermicro platform soon. The review is already written and in the publishing queue. For now, we are going to focus on the CPU performance.
Intel Atom C3338 Benchmarks
For our testing we are using Linux-Bench scripts which help us see cross platform “least common denominator” results. We are using gcc due to its ubiquity as a default compiler. One can see details of each benchmark here. We are likely going to update the Linux-Bench in the near future with a few new tests as well as an even simpler to use/ faster revision, but for now, we are using our old Ubuntu 14.04.3 LTS version. We did have to compile new ixgbe network drivers to get the setup working with Denverton’s Intel X553 NICs.
The item to remember here is that any benchmark we are publishing has had at least 10,000 profiling runs on a multitude of different architectures to ensure we get consistent results before we add it to our repertoire. Unlike most other benchmark sites we also test under full heat soak conditions akin to how servers are deployed in the real world to get useful numbers. Given modern processor architectures, all of which manage clock speeds based on temperature, tests conducted with less than 24 hours of heat soak run time are just about useless. We ran the system, and other listed systems, for a full day before we started taking power and performance measurements. If we needed to change something, we reset our 24-hour heat soak clock before taking data runs.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make with every thread in the system. We are expressing results in terms of complies per hour to make the results easier to read.
The key takeaway here is that there is a huge improvement over the previous generation Intel Atom C2358. The overall performance is still well behind larger cores. Both current generation Broadwell-DE and Skylake designs benefit from L3 cache and more robust cores. If you run linux systems, there is a good chance you will be compiling software at some point. If that is the case, the newest generation of Denverton CPUs will offer noticeable improvements.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.
This was one of the more interesting results. Two core Intel Atom C3338 outperforms four core Intel Atom C2558 by the slimmest of margins. Years of IPC advancements and we get interesting results such as these.
7-zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Here we get solid performance improvements over the Atom C2358. We see that higher core counts and larger cores do provide signficantly more performance.
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here.
This is not going to be a typical workload for an embedded processor. At the same time, it is one of our standard workloads where the dual core Intel Atom C3338 proves to provide almost as much performance as the quad-core Intel Atom C2558.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.
We found the sysbench CPU test to be intriguing as the C3338 notched a slight victory over the older generation four core parts.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
This is what we saw in-line with our quad core benchmarks we published previously.
As more of these parts are released, we are going to do a more in-depth OpenSSL piece. From what we have been seeing, many of the Intel Atom C3000 OpenSSL benchmarks are showing >2x performance gains due, in part, to better AES crypto units. While we see large, across the board, improvements between the Atom C2000 and Atom C3000 series, the OpenSSL performance is a clear standout. OpenSSL is a workload that is important for many of the embedded applications the Intel Atom C3338 is intended for.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Of course, these chips are not meant for heavy compute but we pick out the UnixBench 5.1.3 Dhrystone 2 and Whetstone results to show some of the raw performance they are capable of. UnixBench is widely used so it is a good comparison point.
We added the Atom D525 benchmarks in this chart just to give one a good sense of how far we have come on the embedded side.
Overall our Intel Atom C3338 performs well, often using massive IPC improvements to leapfrog previous generation performance.
The Intel Atom C3338 shows promise for the Intel Denverton series. The performance per clock and per core is significantly higher than with the Intel Atom C2000 series. At the same time, from the Intel Atom C3338 and other chips we have used, performance oriented applications will still favor Intel’s larger cores such as Broadwell-DE. On the other hand, if you were building an appliance where you need low power x86 and wanted modern features such as PCIe 3.0, 10GbE and SATA 3.0, the Intel Atom C3338 can be a strong contender. The platform is still awaiting the software ecosystem to catch up with drivers for an easy out-of-the-box experience. In terms of power consumption, we are going to publish those figures with our official platform reviews since so much power is consumed by additional onboard components when you get to <10w SoCs. For now, we will say that the minumum/ maximum power seems to be closer to what we saw with the Atom C2000 series, just with significantly more performance.