Intel Atom C3558 Linux Benchmarks and Review

14
Intel Atom C3000 Denverton Package STH
Intel Atom C3000 Denverton Package STH

In the Intel Atom C3000 series, we see the Intel Atom C3558 as something special. It has four cores, and plenty of horsepower to drive most network and storage appliances. We think this is going to be a SKU a lot of our readers will find uses for as the potential for embedded appliances is strong.

Intel Atom C3558 Overview

In the Intel Atom C3000 codenamed “Denverton” world, the Intel Atom C3558 is a lower-end SKU. Just to provide some level of comparison, here is the overall list of Denverton C3000 series SKUs:

Intel Atom C3000 Denverton Launch SKU List 3 Formatted
Intel Atom C3000 Denverton Launch SKU List 3 Formatted

Key stats for the Intel Atom C3558: 4 cores / 4 threads, 2.2GHz base and turbo with a 8MB L2 cache. The CPU features a paltry 16W TDP. Here is the Intel ARK page for the offical reference. For those wondering about feature sets, here is the lscpu output of the chip:

Intel Atom C3558 Lscpu Flags
Intel Atom C3558 Lscpu Flags

One of the advantages over other architectures is that the Intel Atom C3558 is an x86 CPU. As a result, one can easily manage embedded appliances over existing management frameworks. For example, here is the platform being integrated into our Rancher container orchestration that resides in our main data center.

Intel Atom C3558 In Rancher Docker Swarm
Intel Atom C3558 In Rancher Docker Swarm

From here we could run containers as normal. One will also note that we are able to use an older kernel, 4.4.0, with the setup. Many of the newer CPUs require newer kernels, but this ran just fine. We will want to note that if you want out-of-the-box support for the NICs, you will also want to use something such as the Ubuntu 16.04.3 HWE kernel which includes support.

Test Configuration

For this system, we were using a pre-production PCB but in our standard test configuration.

  • Motherboard: Supermicro A2SDi-4C-HLN4F
  • Memory: 32GB (2x16GB) Crucial DDR4-2133 RDIMM
  • SSD: Intel DC S3710 400GB
  • SATADOM: Supermicro 32GB SATADOM
  • OSes Tested: Ubuntu 14.04.5 LTS, 16.04.3 LTS

We had a single 16GB RDIMM available so that is what we used. The motherboard we were using was a pre-production sample that did not have 10GbE NICs. As a result, we are not going to publish power numbers for the system outside of our formal review. If you looked at our Supermicro SYS-5029A-2TN4 NAS review based on the Intel Atom C3338 and added 10-15W you would be in the ballpark of power consumption for this board.

Hardware support in legacy OSes make installation slightly more challenging. This is similar to what we see with every new embedded NIC so it was expected. You can read about how to get this working in our piece: Day 0 with Intel Atom C3000: Getting Intel X553 NICs Working. If you are using a newer OS such as Ubuntu 16.04.3 LTS with the HWE kernel, the NICs will work out of the box.

Intel Atom C3558 Benchmarks

For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results. We do have a full set of expanded benchmarks from our next-gen test suite (Linux-Bench2) which you may see in other STH reviews that include this chip. The target market of the Intel Atom C3558 is on embedded applications making the original tests more useful. The Intel Atom C3558 is also a lower-cost chip so we are using a comparison set using some other Atom C3000, some Atom C2000 and then select offerings from other classes of CPUs such as Xeon D and the low-end of Intel Xeon Scalable.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.

Intel Atom C3558 Linux Kernel Compile Benchmark
Intel Atom C3558 Linux Kernel Compile Benchmark

Here the results are good. You can see that performance is able to relatively keep pace with the lower-end 4-core Atom C2000 chips as well as the Pentium D1508 dual-core (Broadwell-DE) option.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.

Intel Atom C3558 C Ray Benchmark
Intel Atom C3558 C Ray Benchmark

In terms of c-ray performance, we see that the Intel Atom C3558 perform relatively well and considerably ahead of the previous generation Intel Atom C2558.

7-zip Compression Performance

7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

Intel Atom C3558 7zip Benchmark
Intel Atom C3558 7zip Benchmark

In terms of compression performance, we can see that the chip is a major step up from the dual-core C3338 that sits below the C3558 in the SKU stack.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

Intel Atom C3558 OpenSSL Sign Benchmark
Intel Atom C3558 OpenSSL Sign Benchmark

Here is the OpenSSL Verify performance:

Intel Atom C3558 OpenSSL Verify Benchmark
Intel Atom C3558 OpenSSL Verify Benchmark

This is an interesting comparison since it shows that the Intel Atom C3558 is about on part with the eight core Intel Atom C2758, the previous generation’s highest-end part.

We also wanted to show what the -evp results are between the Intel Atom C3558 and previous generation C2558:

Intel Atom C2558 V Intel Atom C3558 AES
Intel Atom C2558 V Intel Atom C3558 AES

As you can see, there is a drastic improvement across the board.

UnixBench Dhrystone 2 and Whetstone Benchmarks

One of our longest running tests is the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:

Intel Atom C3558 UnixBench Dhrystone 2 Benchmark
Intel Atom C3558 UnixBench Dhrystone 2 Benchmark

Here are the whetstone numbers:

Intel Atom C3558 UnixBench Whetstone Benchmark
Intel Atom C3558 UnixBench Whetstone Benchmark

As you can see, the performance is strong. Single threaded performance is nowhere near the Intel Xeon E3 line nor the Xeon D. At the same time, if you are looking for generational improvement for edge devices, this is it.

Final Words

Performance, from the perspective of this chip being a 16W part, is nothing short of amazing. We like the fact that performance is up across the board, often venturing into the realm of the previous generation Intel Atom C2758 CPU. Without turbo boost, clocks are clearly limited. We do think this is a candidate for extremely lightweight virtualization and running containers. Intel’s decision to allow dual channel DDR4 RDIMMs gives this generation significantly more memory capacity than the previous generation. The limitation to DDR4-2133 was unnecessary to create more differentiation in the lineup. Instead, we would have liked to have seen DDR4-2400 as the standard on the Atom C3558.

One area that we are less than keen on is the 12 High-Speed I/O lanes. That means, for example, one gets 8 fewer PCIe or SATA lanes than top bin parts. Simply moving up the stack to the Intel Atom C3758 doubles the cores (8 v. 4), adds up to two more 10GbE ports (4 v. 2), adds a higher-bin QuickAssist, and gives 20 HSIO lanes instead of 12. That major platform upgrade essentially comes at the cost of $107 for the CPU, some small amount for the 10GbE PHY, and 9W TDP. If you want to do more compute heavy tasks, move up the stack. If you are looking for a simple device, then this is a great option.

There are many firewalls, VPN gateways and 4-8 bay NAS units that simply do not need more HSIO or cores. For those numerous applications, the Intel Atom C3558 is a great chip. It is also a SoC we recommend upgrading to if you were contemplating the Intel Atom C3338. Performance and HSIO lanes are significantly better on this new model.

14 COMMENTS

  1. Would you mind simply pasting lscpu output instead of attaching a screenshot? Makes it easier to sift through the CPU flags, it looks like this CPU doesn’t have AVX / AVX2? This seems somewhat odd.

    Also you wrote under Test configuration:
    “Memory: 32GB (2x16GB) Crucial DDR4-2133 RDIMM”
    “We had a single 16GB RDIMM available so that is what we used.”

    And under OpenSSL performance:
    “Atom C3558 is about on part with the eight core Intel Atom C2758” I suppose you meant to write “on par”?

  2. Hi Patrick, thank you for the review.

    What do you mean by “The motherboard we were using was a pre-production sample that did not have 10GbE NICs.”, is there any plan to release another C3558 based motherboard other than A2SDi-4C-HLN4F (which uses the Marvell 88E1543 PHY) as far as you can tell?

    “That major platform upgrade essentially comes at the cost of $107 for the CPU, some small amount for the 10GbE PHY”: too bad Supermicro is charging about twice the amount from what I can see online.

  3. It would be nice to include the AMD Opteron X3000 series into the benchmarks. The price tags on HPE MicroServer Gen10 boxes makes them a clear competitor to NAS boxes based on the smaller C3000 SKUs.

  4. @Martin – great point. They are slightly different products but you are right, there is some overlap at the very lowest end of the market.

    @Safari We actually did the C3558 benchmarks about 6 months before the chip was officially released. The board was ready, Intel kept pushing the date.

    I am unsure on the plans. There is a difference in some boards using 1GbE only and 10GbE. 10GbE boards will cost more for PCB noise isolation and PHYs (from what I am told.) On the 10Gbase-T side, the PHY cost is more significant.

    @Nils we did this previously in forum posts, we had the opposite response.

  5. not trying to be a dick but the word supermicro is a typo in the article. I only noticed it because i copy and pasted looking for this board for sale. Fix if you want, or not. its not that important. you can also delete this comment. thanks

    “Motherboard: Supermciro A2SDi-4C-HLN4F”

  6. @Patrick – That sentence seemed to imply that the final version would have shipped with a 10Gb PHY 🙂

    Regarding the cost, yes I know, but looking at the pricing it seems that SuperMicro is charging more than just BOM costs. For instance the X10SDV-2C-TLN2F is only more expensive than the A2SDi-4C-HLN4F by the CPU (~45$) and PHY (40$) cost, even though the X10SDV series is more complex in terms of CPU pins, power supply, headers, lanes and routing, SMDs – assuming the PCBs have the same number of layers (the A2SDi one looks solid green).
    On the contrary both the A2SDi-8C-HLN4F and A2SDi-H-TF are significantly more expensive than the A2SDi-4C-HLN4F despite being the same design, and my guess is that the A2SDV-8C-TLN5F is going to be twice as expensive despite not being hugely more powerful. So yes, I agree with your considerations, but I don’t think 8 cores are just a matter of BOM, at least current SuperMicro pricing does not reflect it, so it’s probably intentional.

  7. @dave – The D-1541 has the larger instruction set, 45W TDP, higher cocks, 8 cores / 16 threads and about 3x the price. Its competition is more of the Pentium D1508 and D-1518 which is why we included those instead.

  8. Consider adding in some UI controls so this data can be driven on the fly and easily updated (maybe even allow users to add in selected CPU results).

  9. I’m wondering… C2558 was barely enough for plex 1080p transcoding. Would the C3558 be enough for a smooth transcoding, or C3758 still a better option ? The price difference makes C3558 really tempting for a home Freenas/Plex server, but will it handle it ?

  10. @Gustavo said:
    > Would the C3558 be enough for a smooth transcoding, or C3758 still a better option ?

    I would also like to know this. Also, I would be interested in kowing, how software like “UnRAID” or “SnapRAID” get along on such a CPU. Is it good enough for parity calculation and transcoding at the same time? While the benchmarks used on this site may be nice for absolute comparison between different devices, they lack real-life value for a serve(r) at (the) home. 🙂

  11. C3000 series supports Intel QAT crypto and compress accelerator supported under linux, hence it is 10 times faster in many cases than in yout menchmarks. SSL and compressions are main things nowerdays, hence the conclusions are even wrong, it works for high end vpn, web proxy-s with 10 gig interfaces etc.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.