Today we are taking a look at the Intel Atom C3955 16 core CPU. This SKU is one of the top-bin parts of a family codenamed “Denverton”. The Intel Atom C3000 series succeeds the Intel Atom C2000 series that first launched in late 2013. Like the Intel Atom C2000 series, there is no Hyper-Threading on the C3000 series so this is a 16 thread CPU. What we are about to see today is something special. We are going to see what happens when one has twice the CPU cores and each core significantly increases speed over the previous generation. There are few times when we see a generational speed up of well over 2x.
Beyond the raw CPU specs, there are other factors to consider when comparing the Intel Atom C3955 to the previous generation C2750. First, one moves from four 1GbE ports (or 2.5Gbps KR) to four 10GbE ports. Second, we are no longer limited by DDR3 ECC SODIMMs. Instead, we can use DDR4-2400 RDIMMs so getting 128GB in 4x 32GB is easy. PCIe is updated to 3.0 and there are now 20x high speed I/O lanes for SATA 3 or PCIe. For comparison, Avoton had PCIe 2.0 x16, 2x SATA 3 and 4x SATA II. The Atom C2000 architecture was great for the 1GbE world, and worked okay for 10GbE (with feature offload) via an add-in card. Storage wise spinning disks and a SATA SSD or two worked fine in the Intel Atom C2000 generation. With the increased demands for higher-speed networking and higher-speed PCIe flash storage, this platform needed to be upgraded.
Key stats for the Intel Atom C3955 series: 16 cores / 16 threads, 2.1GHz. The CPU features 32W TDP. This CPU also features a full 20x high-speed I/O lanes and has 4x10GbE making it top-bin in terms of features for non-QuickAssist parts.These chips are not socketed so end customer pricing will include a motherboard at a minimum. We expect to see pricing of $700 for these in small quantities. Virtualization features such as VT-d and SR-IOV are supported on this generation.
Here is the lscpu output of the chip. You will notice that some of the higher-end comptue features are not present such as AVX2 and AVX-512 which makes perfect sense in an embedded CPU.
Our test configuration is very similar to what we used for our Intel Atom C2000 series reviews.
- Motherboard: Supermicro A2SDi-H-TP4F
- CPU: Intel Atom C3955
- RAM: 4x 16GB DDR4-2400 RDIMMs (Micron)
- SSD: Intel DC S3710 400GB
- SATADOM: Supermicro 32GB SATADOM
For our review, we are going to focus our energy comparing the Intel Atom C3955 to the previous generation Intel Atom C2000 series parts, other Intel Atom C3000 series CPUs and other low-power single socket solutions that may compete in this segment.
For those wondering, the maximum power consumption with 4x 10GbE links we have seen is around 53W but we will publish formal figures with our platform reviews.
Intel Atom C3955 Benchmarks
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross platform “least common denominator” results. We do have a full set of expanded benchmarks from our next-gen test suite (Linux-Bench2) which you may see in other STH reviews that include this chip. The target market of the Intel Atom C3955 is on embedded applications making the original tests more useful.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.
This is perhaps the most interesting benchmark as it stresses multiple parts of the system well beyond L1/ L2 caches. Here we see the sixteen core Intel Atom C3000 series perform just between the eight core Xeon D’s from Q4 2015 and a modern $400 8 core Intel Xeon Silver 4108. The generation on generation top-of-the-line performance from the Intel Atom C2000 to C3000 series is greater than 2x which we would expect from a four year later incarnation with only 1.6x the TDP.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.
In the c-ray tests we added a slightly broader set of results including the NVIDIA Jetson TX2 development platform to show 64-bit ARMv8 embedded performance. We are also showing, in each of these charts, an interesting view: top to bottom Intel Atom C3000 performance. The Intel Atom C3338 is the lowest-end SKU while the Intel Atom C3955 is the highest compute performance SKU.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Again, we are seeing performance that we would put closer to eight core Xeon D. It is clear that the Intel Atom C3000 line lacking heavier Xeon cores and L3 cache. Still, if you compare the results to the previous generation this is a 2x performance improvement.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.
If you look at sysbench tests, you will notice that the performance is nearing the point where we have the Intel Atom C3000 series is using half the cores to hit a similar level of performance versus the Intel Atom C2000 series.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
As we move to Linux-Bench2, some of the newer -evp results are even more impressive. Stay tuned for that.
UnixBench Dhrystone 2 and Whetstone Benchmarks
One of our longest running tests is the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
And the whetstone results.
Single threaded performance is certainly better than the Intel Atom C2000 series, however, it is still a long way away from the larger cores.
Generationally, the top bin to top bin CPU comparison between the Intel Atom C2750 and Intel C3955 is no contest. The higher performance per-core figures combined with greatly enhanced memory capacity and I/O make the Intel C3955 a very high-end SKU. For its target market, it is awesome.
One item that deserves significant attention is price. The Intel Atom C2750 was a 20W $171 part. TDP is up with this generation but so is the price. We expect 8 core parts to be significantly more than the Avoton/ Rangeley generation. 16 core parts Intel is charging a steep premium of several hundred dollars over the top end. At the end of the day, you buy this CPU because you need the performance and platform features in the 32W TDP. Otherwise, the Intel Xeon D becomes attractive. Likewise, if you need higher performance, the Xeon D, Xeon E3 and Xeon Scalable lines all have options in this price range.
It is hard not to be excited about an ultra low power 16 core x86 part. We first saw the 16-core Denverton at Computex 2016 (Q2) and have been eagerly awaiting its release ever since then. Almost five quarters later, we finally have the part and shipping.
We are publishing this review on August 15, 2017 the official launch day. Realistically, chips are just being released by Intel so we would expect at least a month until we start seeing any significant shipping volumes. As BGA parts, once Intel launches OEMs have to affix them to motherboards and test before they can be sold at retail or in embedded products.
If you want to learn more, we have complete coverage at our Denverton Day Official STH Intel Atom C3000 Launch Coverage Central