At STH, the big core architectures get a lot of airtime, but make no mistake, the lower-power architectures are just as, if not more important. Today Intel is giving more details on its Tremont architecture which is the first major evolution in the Atom core since Goldmont/ Goldmont Plus. Prior to presenting the new architecture today, STH had a pre-briefing that we can finally share.
Intel Tremont Introduction and Goals
Intel Tremont is the company’s next-generation low-power microarchitecture. Big x86 cores are simply not needed in every application. There are a lot of places where low power and density matter more than having huge core counts. Tremont is designed for those applications.
Before we get into details, the summary is fairly simple. Intel has focused on architecture improvements that have yielded significant IPC increases.
The architecture is designed to minimize power and packaging footprint but also to be flexible enough to allow for Intel’s product teams to customize it for specific applications.
Intel Tremont Architecture Overview
Before we get too far, here is the full diagram. If you think this is too hard to read, we have it in an attachment page linked.
We are going to move through Intel’s slides that they are presenting at a conference today. First, the company is talking about the micro architecture’s front end.
We are not going to simply transcribe text from each slide. Instead, we will let you read them and provide a bit of commentary. The “Core class branch prediction” means that the branch prediction is more akin to what we see in the Intel Core family (e.g. Core i3, i5, i7) instead of a lower-end design. Branch prediction is becoming a theme in this space as even Arm is focusing on that heavily in the Next-Gen Arm Neoverse N1 and E1 Cores.
The engine is out-of-order which is different than what we saw in the early days of Atom. The first atom was a dual-issue in-order design (although single-issue was considered.) We now have a 6-wide x86 instruction decode that is split into dual 3-wide clusters. Intel’s design is that if the second cluster is not needed, the second cluster can spin down saving power.
Intel originally told us that the out of order window was over 200 entries, but we have the exact number of 208.
Since chips based on Tremont are designed for endpoints, especially in the 5G infrastructure space, we are seeing more crypto acceleration being added. If you want to build an AVX-512 supercomputer, the Tremont vector engines are not what you want, but they are improved over Goldmont Plus.
Companies today are designing chips with a heavy emphasis on data movement and caching. Keeping hot data close to the execution units at the right time is important for performance but also keeping system power consumption low.
Intel’s product teams can configure between 1.5MB to 4.5MB of L2 cache. STH asked if there was a hard correlation, e.g. if 1 core could only have 1.5MB or if 4 cores required 4.5MB and there is not. Like other lower power designs, we are not seeing a massive L3 cache here.
The cache can be either inclusive or non-inclusive based on the fabric it is being attached to. With Tremont, and products like Lakefield that we covered a few months ago, the smaller cores can be combined with larger cores for hybrid chips. Here is a view showing the Sunny Cove big core along with four Tremont Atom cores. Depending on how the Tremont cores are deployed, the L2 cache may need to operate differently.
Intel is adding features like Speed Shift and also TXT/ Boot Guard to the design. Security features, in particular, are a hot topic for those designing and deploying 5G infrastructure, but also in the data center and client spaces.
Again, Intel is not making a specific product announcement, but it is giving some sense of performance.
Intel Tremont Architecture Performance
With all of the above changes, Intel focused on getting IPC and single-thread performance up.
Intel showed its SPEC CPU 2006/ 2017 set of bars. Here Intel now takes the legacy (SPEC CPU2006 components), along with the current 2017 components on both the integer and floating-point tests and just shows improvements without labeling them.
Through that methodology, Intel says it is around 30% faster than Goldmont Plus. Goldmont Plus is, in turn, about 30% faster than Goldmont which is in chips like the Intel Atom C3558. As we saw with the Atom C3000 series, there was a massive IPC jump and feature inclusion over the Atom C2000 series so we expect next-gen Atom SKUs to be better still.
One of the technologies we talked about a bit in this article, and covered previously in the linked Hot Chips 31 coverage, is hybrid x86. With Lakefield, four Tremont cores are combined with a larger Sunny Cove core to achieve better-combined performance by balancing power with performance needs.
Outside of Lakefield, we have not seen other products with Tremont yet, but we like the direction.
Tremont is not just a potential Atom replacement in some segments. It is much more than that. As we are already seeing in devices like the newest Microsoft Surface lineup, the ability to add Tremont cores alongside larger more powerful cores is going to be a game-changing capability for Intel and system designers.
We gently asked but Intel declined to comment on specific products outside of what has already been announced product-wise. Instead, Tremont is the architecture that teams at Intel can implement in products going forward. The flexibility of core counts and cache sizes are good examples of where teams can optimize on specific workloads.
Background Tremont Reading on STH: