New Qualcomm Centriq 2400 details 48 cores 60MB L3 cache over 2GHz

October 6, 2017

Qualcomm this week presented new details around its upcoming ARM CPU. We had a lot of interest around the Qualcomm Centriq 2400 after the company’s Hot Chips 29 presentation. At that time, the company did not disclose key facts such as clock speeds and cache sizes. This week, Qualcomm presented new facts on the Centriq 2400 10nm chip design.

New Qualcomm Centriq 2400 Details

Here is the Centriq 2400 overview from Hot Chips 29. We now have more details around some of the specifics.

First, in terms of clock speed, we can expect 2.0GHz+ clock speeds out of the chip. Clock speed impacts both performance and power consumption so going above 2GHz provides some sense regarding where that will end up.

Also new are the details around the interconnect and L3 cache sizes. We can see a total of 60MB L3 cache is listed. L2 cache size is listed as 512KB shared per cluster with 24 clusters gives us a total of 12MB L2 cache.

Qualcomm Centriq 2400 Foundational Elements

The inclusion of 8x SATA III 6.0gbps ports helps us see where this is targeted. A larger SATA array would have made us think this is a storage application focused chip. 8 SATA III ports makes sense as we are seeing a bigger focus on NVMe for primary storage with 1-2 boot SATA III devices being common. SATA III is quickly becoming the interface of choice for commodity hard drive storage.

We had a lot of this information about the Falkor Core Duplex from the Hot Chips presentation, including details such as this is AArch 64 only. The 512KB L2 ECC cache is a net new add in this disclosure.

Qualcomm Centriq 2400 Falkor Core Duplex

The L3 cache details are fascinating. There is a distributed 60MB L3 cache that is split into 12x 5MB chunks. The cache can operate in standard or in victim mode.

The on chip interconnect is a bi-direction segmented ring bus. It also is a multi-ring design. Some may compare this to the rings that we saw on the Intel Xeon E5 series. We see this as substantially different.

Qualcomm Centriq 2400 On Chip Interconnect

Each of the four ring segments has 64GB/s of bandwidth for an aggregate of 256GB/s.

Along these rings there are the 12x 5MB L3 cache chunks as well as the 6 memory controllers that support up to DDR4-2666.

Qualcomm Centriq 2400 Distributed LLC And DDR

Dealing with the large L3 cache chunks and ensuring they are used efficiently by a large number of cores means that Qualcomm has to do a lot of work managing how data utilizes the space.

Qualcomm Centriq 2400 Distributed PoC And Snoop Filter

As a bit of perspective here, Qualcomm has more combined on-chip L2 / L3 cache than the $10,000 Intel Xeon Platinum 8180. That is a huge amount of silicon real estate dedicated to cache.

The distributed IOMMUs help Qualcomm manage resource contention. Our sense is that the Qualcomm Centriq 2400 was intended to have a significant amount of I/O devices connected so this is a major design point.

Qualcomm Centriq 2400 Distributed IOMMUs

The L3 Quality of Service extentions we covered in the hot chips piece. The updated slide is slightly different so we are just going to post it for our readers.

The memory bandwidth compression slide is the same as we saw at hot chips, save some formatting. Memory bandwidth compression is a key technology that Qualcomm is using to compress data inline with low latency.

Qualcomm Centriq 2400 Memory Bandwidth Compression

Final Words

Overall, we are pleased to see that Qualcomm will have higher-clock speeds. The Cavium ThunderX topped out at 2.5GHz and we are hoping for high clock speeds on the Qualcomm Centriq. High clock speeds help with latency sensitive requests. We are impressed by the sheer volume of L2 and L3 cache on the Qualcomm Centriq 2400 as 72MB combined L2+L3 cache is extremely aggressive in addition to having 48 cores on-die.

Stay tuned for more on the Qualcomm Centriq 2400 from STH.

6 COMMENTS

Greg Stonokowski October 6, 2017 At 11:39 am

I’d say this is promising. That’s lots of cache. I’m just as interested in hearing your input on usability as performance, maybe more.
Ryan October 7, 2017 At 1:46 pm

Embedded dual 1GbE not even 10G? Why even bother? Is anyone really going to run these chips off gigabit ethernet?

I would’ve loved to see them do something cool with embedded 10/25/40/100 and that IOMMU and QoS stuff instead.
Patrick Kennedy October 7, 2017 At 2:42 pm

The hyper-scale guys all use their own NICs. Cavium learned this with ThunderX (1) which is one reason ThunderX 2 does not have 40/50/100GbE.
dfx2 October 8, 2017 At 2:15 pm

Isn’t ThunderX2 based off Broadcom’s Vulcan design rather than the original ThunderX2 presented a couple of years ago? Perhaps that’s the other reason for not having NICs on chip(?) because if it wasn’t originally on Vulcan, it might not make sense to add it.
Patrick Kennedy October 8, 2017 At 2:56 pm

That is true as well @dfx2
dfx2 October 9, 2017 At 10:13 am

It seems quite light on I/O though with 32 PCIe lanes. PC HEDTs have more these days. I’m left with a feeling that they missed a few opportunities to go big. Unless, despite the talk of wanting to take on Xeon, they actually know they’ll realistically take on Atom at the fringes of the datacenter.

They’ll also have to share a roadmap… most server makers take that sort of thing seriously.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

New Qualcomm Centriq 2400 Details

Final Words

RELATED ARTICLESMORE FROM AUTHOR

SPEC Consortium Releases SPEC CPU 2026 Benchmark Suite: The Next Decade of CPU Benchmarking

Meta Buys Tens of Millions of AWS Graviton Arm Cores in a CPU Land Grab

Arm AGI CPU Launched Establishing Arm as a Silicon Provider

6 COMMENTS

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR