Intel Atom C3338 Benchmarks – Why Denverton is so Sweet

7
Intel C3000 Denverton Day On STH
Intel C3000 Denverton Day On STH

What happens when you are looking for a low power, x86 appliance for storage or networking but do not want to spend a fortune? Intel’s newest weapon in the low power SoC market is the Atom 3000 series codenamed “Denverton.” We published the first benchmarks of Denverton a few weeks ago as well as Intel’s official announcement. The lowest end Atom C3000 series chip we have seen was also the first released in January 2017, the Intel Atom C3338. The Intel Atom C3338 sports a dual-core CPU. Base clock is 1.5GHz with maximum turbo of 2.2GHz. Compared to the Intel Atom C2358 that is a decrease of 200MHz base and an increase of 200MHz turbo clocks which shows the evolution of Intel Turbo boost. The Intel Atom C3338 also sports 4MB L2 cache up from 1MB on the dual-core Atom 2000 series part. If those specs make you dizzy, for less than half the recommended price ($27 v. $60), and a similar TDP (9W v. 7W), Intel is offering higher performance cores and 4x the amount of L2 cache.

Test Configuration

We tried to model a realistic configuration for the Intel Atom C3338 chip.

We had a single 16GB RDIMM available so that is what we used. We should note that the Intel Atom C3338 is a single-channel memory controller that supports up to 64GB RAM (2x 32GB DDR4-1866 RDIMMs.) Compare this with the Atom C2358 which supported only DDR3 and unbuffered ECC DIMMs for a maximum of 32GB. Practically, the Atom C2358 had a 16GB total RAM limitation as 16GB low power DDR3 ECC UDIMMs were hard to get and not well supported. We are going to have a full review of the Supermicro platform soon. The review is already written and in the publishing queue. For now, we are going to focus on the CPU performance.

Intel Atom C3338 Benchmarks

For our testing we are using Linux-Bench scripts which help us see cross platform “least common denominator” results. We are using gcc due to its ubiquity as a default compiler. One can see details of each benchmark here. We are likely going to update the Linux-Bench in the near future with a few new tests as well as an even simpler to use/ faster revision, but for now, we are using our old Ubuntu 14.04.3 LTS version. We did have to compile new ixgbe network drivers to get the setup working with Denverton’s Intel X553 NICs.

The item to remember here is that any benchmark we are publishing has had at least 10,000 profiling runs on a multitude of different architectures to ensure we get consistent results before we add it to our repertoire. Unlike most other benchmark sites we also test under full heat soak conditions akin to how servers are deployed in the real world to get useful numbers. Given modern processor architectures, all of which manage clock speeds based on temperature, tests conducted with less than 24 hours of heat soak run time are just about useless. We ran the system, and other listed systems, for a full day before we started taking power and performance measurements. If we needed to change something, we reset our 24-hour heat soak clock before taking data runs.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make with every thread in the system. We are expressing results in terms of complies per hour to make the results easier to read.

Intel Atom C3338 Python Linux Kernel Compile Benchmark
Intel Atom C3338 Python Linux Kernel Compile Benchmark

The key takeaway here is that there is a huge improvement over the previous generation Intel Atom C2358. The overall performance is still well behind larger cores. Both current generation Broadwell-DE and Skylake designs benefit from L3 cache and more robust cores. If you run linux systems, there is a good chance you will be compiling software at some point. If that is the case, the newest generation of Denverton CPUs will offer noticeable improvements.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.

Intel Atom C3338 C Ray Benchmark
Intel Atom C3338 C Ray Benchmark

This was one of the more interesting results. Two core Intel Atom C3338 outperforms four core Intel Atom C2558 by the slimmest of margins. Years of IPC advancements and we get interesting results such as these.

7-zip Performance

7-zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

Intel Atom C3338 7zip Benchmarks
Intel Atom C3338 7zip Benchmarks

Here we get solid performance improvements over the Atom C2358. We see that higher core counts and larger cores do provide signficantly more performance.

NAMD Performance

NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here.

Intel Atom C3338 NAMD Benchmark
Intel Atom C3338 NAMD Benchmark

This is not going to be a typical workload for an embedded processor. At the same time, it is one of our standard workloads where the dual core Intel Atom C3338 proves to provide almost as much performance as the quad-core Intel Atom C2558.

Sysbench CPU test

Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.

Intel Atom C3338 Sysbench CPU Benchmark
Intel Atom C3338 Sysbench CPU Benchmark

We found the sysbench CPU test to be intriguing as the C3338 notched a slight victory over the older generation four core parts.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

Intel Atom C3338 OpenSSL Sign Benchmark
Intel Atom C3338 OpenSSL Sign Benchmark

This is what we saw in-line with our quad core benchmarks we published previously.

Intel Atom C3338 OpenSSL Verify Benchmark
Intel Atom C3338 OpenSSL Verify Benchmark

 

As more of these parts are released, we are going to do a more in-depth OpenSSL piece. From what we have been seeing, many of the Intel Atom C3000 OpenSSL benchmarks are showing >2x performance gains due, in part, to better AES crypto units. While we see large, across the board, improvements between the Atom C2000 and Atom C3000 series, the OpenSSL performance is a clear standout. OpenSSL is a workload that is important for many of the embedded applications the Intel Atom C3338 is intended for.

UnixBench Dhrystone 2 and Whetstone Benchmarks

Of course, these chips are not meant for heavy compute but we pick out the UnixBench 5.1.3 Dhrystone 2 and Whetstone results to show some of the raw performance they are capable of. UnixBench is widely used so it is a good comparison point.

Intel Atom C3338 UnixBench Dhrystone 2 Benchmark
Intel Atom C3338 UnixBench Dhrystone 2 Benchmark

We added the Atom D525 benchmarks in this chart just to give one a good sense of how far we have come on the embedded side.

Intel Atom C3338 UnixBench Whetstone Benchmark
Intel Atom C3338 UnixBench Whetstone Benchmark

Overall our Intel Atom C3338 performs well, often using massive IPC improvements to leapfrog previous generation performance.

Final Words

The Intel Atom C3338 shows promise for the Intel Denverton series. The performance per clock and per core is significantly higher than with the Intel Atom C2000 series. At the same time, from the Intel Atom C3338 and other chips we have used, performance oriented applications will still favor Intel’s larger cores such as Broadwell-DE. On the other hand, if you were building an appliance where you need low power x86 and wanted modern features such as PCIe 3.0, 10GbE and SATA 3.0, the Intel Atom C3338 can be a strong contender. The platform is still awaiting the software ecosystem to catch up with drivers for an easy out-of-the-box experience. In terms of power consumption, we are going to publish those figures with our official platform reviews since so much power is consumed by additional onboard components when you get to <10w SoCs. For now, we will say that the minumum/ maximum power seems to be closer to what we saw with the Atom C2000 series, just with significantly more performance.

7 COMMENTS

  1. Do you have power consumption figures too that are comparable with your previous Avoton/Rangeley/Xeon-D test? Thanks.

  2. The OpenSSL benchmark needs some clarification: What are you using to sign and verify? nistp256? RSA2048? Is this per-core, or multicore? What engine is used – none or rdrand?

    To imitate use in practice you’d combine RSA2048 signing and ecdh256 ops/s. That’s what’s done in the full TLS handshake with PFS cipher suites. (Or RSA2048 and X25519. In a few cases replace RSA2048-sign with nistp256-signing.)

    Does your OpenSSL version include patches for AVX* and BMI*, or is this vanilla OpenSSL?

    “AES crypto units” (in assume you have AES-NI and QuickAssist/qat in mind) are not involved in neither RSA2048-signing/verifying nor any elliptic curve ops.

  3. Many of the numbers are interesting, but I hope people keep in mind that some of the added benefits of the newer parts might be lost for particular applications. Take, for example, the number of people jumping on the C2x58 platform for use with pfSense while singing the praises of QuickAssist. Several years later, and pfSense STILL doesn’t make any use of QAT on the C2x58 platform. It likely never will.

    Perhaps servethehome should benchmark these chips in situations where they are expected to be used, instead of in environments that offer the greatest advantage. Compare a C2358 to a C3338 limiting yourself to the environment given in a pfSense installation (for networking) or a FreeNAS installation (for storage.) I think this would give your readers a much better appreciation of the advantages (or not) in a newer platform.

  4. As far as I can tell, the $27 Atom C3338 does not support 10GbE, only 1GbE and 2.5GbE. QuickAssist is also not present on this particular Denverton chip.

  5. I’ve been waiting for these Denverton based Supermicro boards for a long time, but I really hope these won’t be the only ones: Denverton is sweet but Supermicro made some really poor decisions which limit its full potential. The internal USB 3.0 connector is a total nonsense (they skipped the only reasonable choice, that is, eMMC). I understand the non technical reasons to go for a BMC but for the low end SKUs it’s not the best option. I think they could have offered a x2 M.2 slot for the c3338 too (my guess is: x2 or x4 PCIe / 2 or 4 sata3 + 1 USB3 + another x2 for the BMC + 1 unused HSIO?) in addition to the classic PCIe slot, which is usually pointless for basic setups (yes people do not change board every 1-2 years and NVMe will have a good adoption soon). Also, I understand it saves money in terms of layouting and BOM, but why 4x1G on so many boards? Who cares? For server farms / data center 10Gb is the bare minimum, for SMB 2x1G are okay.
    Also the pricing is a bit surprising on the low end of the scale:
    – for small home setups, if you do not care about ECC and QAT, mobile CPUs in NUCs, especially with vPro/AMT, are probably a quiet and more compact choice
    – otherwise a Pentium 1508 ends up being much better for just a few more bucks (X10SDV-2C-TLN2F is an absolutely beautiful and well balanced board).
    Disappointed.

  6. Can you please provide some information on power consumption? Would love to see even some crude idle/load numbers.

  7. This was posted in the forums but the particular box was sub 25w even with the large fans. That is not our typical power consumption setup. We will have more power info on our board and system level reviews.

LEAVE A REPLY

Please enter your comment!
Please enter your name here