Intel Tremont Low Power Architecture Detailed

4
Intel Tremont Cover
Intel Tremont Cover

At STH, the big core architectures get a lot of airtime, but make no mistake, the lower-power architectures are just as, if not more important. Today Intel is giving more details on its Tremont architecture which is the first major evolution in the Atom core since Goldmont/ Goldmont Plus. Prior to presenting the new architecture today, STH had a pre-briefing that we can finally share.

Intel Tremont Introduction and Goals

Intel Tremont is the company’s next-generation low-power microarchitecture. Big x86 cores are simply not needed in every application. There are a lot of places where low power and density matter more than having huge core counts. Tremont is designed for those applications.

Intel Tremont What Is It
Intel Tremont What Is It

Before we get into details, the summary is fairly simple. Intel has focused on architecture improvements that have yielded significant IPC increases.

Intel Tremont Summary
Intel Tremont Summary

The architecture is designed to minimize power and packaging footprint but also to be flexible enough to allow for Intel’s product teams to customize it for specific applications.

Intel Tremont Architecture Overview

Before we get too far, here is the full diagram. If you think this is too hard to read, we have it in an attachment page linked.

Intel Tremont Architecture Diagram
Intel Tremont Architecture Diagram

We are going to move through Intel’s slides that they are presenting at a conference today. First, the company is talking about the micro architecture’s front end.

Intel Tremont Front End
Intel Tremont Front End

We are not going to simply transcribe text from each slide. Instead, we will let you read them and provide a bit of commentary. The “Core class branch prediction” means that the branch prediction is more akin to what we see in the Intel Core family (e.g. Core i3, i5, i7) instead of a lower-end design. Branch prediction is becoming a theme in this space as even Arm is focusing on that heavily in the Next-Gen Arm Neoverse N1 and E1 Cores.

Intel Tremont Front End Fetch And Predict
Intel Tremont Front End Fetch And Predict

The engine is out-of-order which is different than what we saw in the early days of Atom. The first atom was a dual-issue in-order design (although single-issue was considered.) We now have a 6-wide x86 instruction decode that is split into dual 3-wide clusters. Intel’s design is that if the second cluster is not needed, the second cluster can spin down saving power.

Intel Tremont Front End Decode
Intel Tremont Front End Decode

Intel originally told us that the out of order window was over 200 entries, but we have the exact number of 208.

Intel Tremont Integer Execution
Intel Tremont Integer Execution

Since chips based on Tremont are designed for endpoints, especially in the 5G infrastructure space, we are seeing more crypto acceleration being added. If you want to build an AVX-512 supercomputer, the Tremont vector engines are not what you want, but they are improved over Goldmont Plus.

Intel Tremont Vector Execution
Intel Tremont Vector Execution

Companies today are designing chips with a heavy emphasis on data movement and caching. Keeping hot data close to the execution units at the right time is important for performance but also keeping system power consumption low.

Intel Tremont Memory Execution
Intel Tremont Memory Execution

Intel’s product teams can configure between 1.5MB to 4.5MB of L2 cache. STH asked if there was a hard correlation, e.g. if 1 core could only have 1.5MB or if 4 cores required 4.5MB and there is not. Like other lower power designs, we are not seeing a massive L3 cache here.

Intel Tremont Memory Subsystem
Intel Tremont Memory Subsystem

The cache can be either inclusive or non-inclusive based on the fabric it is being attached to. With Tremont, and products like Lakefield that we covered a few months ago, the smaller cores can be combined with larger cores for hybrid chips. Here is a view showing the Sunny Cove big core along with four Tremont Atom cores. Depending on how the Tremont cores are deployed, the L2 cache may need to operate differently.

Intel Lakefield HC31 Two Stacked Dies
Intel Lakefield HC31 Two Stacked Dies

Intel is adding features like Speed Shift and also TXT/ Boot Guard to the design. Security features, in particular, are a hot topic for those designing and deploying 5G infrastructure, but also in the data center and client spaces.

Intel Tremont New Instructions And Technology
Intel Tremont New Instructions And Technology

Again, Intel is not making a specific product announcement, but it is giving some sense of performance.

Intel Tremont Architecture Performance

With all of the above changes, Intel focused on getting IPC and single-thread performance up.

Intel Tremont Target Single Thread Performance
Intel Tremont Target Single Thread Performance

Intel showed its SPEC CPU 2006/ 2017 set of bars. Here Intel now takes the legacy (SPEC CPU2006 components), along with the current 2017 components on both the integer and floating-point tests and just shows improvements without labeling them.

Intel Tremont Target Single Thread Performance Improvement
Intel Tremont Target Single Thread Performance Improvement

Through that methodology, Intel says it is around 30% faster than Goldmont Plus. Goldmont Plus is, in turn, about 30% faster than Goldmont which is in chips like theĀ Intel Atom C3558. As we saw with the Atom C3000 series, there was a massive IPC jump and feature inclusion over the Atom C2000 series so we expect next-gen Atom SKUs to be better still.

One of the technologies we talked about a bit in this article, and covered previously in the linked Hot Chips 31 coverage, is hybrid x86. With Lakefield, four Tremont cores are combined with a larger Sunny Cove core to achieve better-combined performance by balancing power with performance needs.

Intel Tremont And Sunny Cove Lakefield Hybrid Performance
Intel Tremont And Sunny Cove Lakefield Hybrid Performance

Outside of Lakefield, we have not seen other products with Tremont yet, but we like the direction.

Final Words

Tremont is not just a potential Atom replacement in some segments. It is much more than that. As we are already seeing in devices like the newest Microsoft Surface lineup, the ability to add Tremont cores alongside larger more powerful cores is going to be a game-changing capability for Intel and system designers.

We gently asked but Intel declined to comment on specific products outside of what has already been announced product-wise. Instead, Tremont is the architecture that teams at Intel can implement in products going forward. The flexibility of core counts and cache sizes are good examples of where teams can optimize on specific workloads.

Background Tremont Reading on STH:

4 COMMENTS

  1. Denverton was significant slower than I actually expected so a little more beef might be nice for home file servers and small business.

  2. I didn’t see this referenced in the article… so two questions..
    1) Is this a Denverton successor?
    2) Are we expecting a C4XXX series of chips?

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.