A Journey to Next-Gen Arm Neoverse N1 and E1 Cores

7

Arm Neoverse E1 Core Architectural Details

Beyond the Arm Neoverse N1, we have a second CPU launched. The Arm Neoverse E1 is a CPU designed for 5G infrastructure and edge compute and one I came away more excited about than I would have thought when we started the Tech Day.

Arm Neoverse Tech Day 2019 N1 V E1 Positioning
Arm Neoverse Tech Day 2019 N1 V E1 Positioning

If one thinks about the Arm Neoverse N1 as the chip that is designed to attack the IPC of Intel Xeon, the Arm Neoverse E1 is going in the other generation. It is targeting throughput and making accelerators intelligent.

Arm Neoverse Tech Day 2019 E1 Performance
Arm Neoverse Tech Day 2019 E1 Performance

Arm also claims large throughput gains for the Neoverse E1 over the previous generation parts. Instead of focusing on the Cortex-A72, the comparison points for Neoverse E1 are the Cortex-A53 and Cortex-A55.

Arm Neoverse Tech Day 2019 Neoverse E1 5G Impact
Arm Neoverse Tech Day 2019 Neoverse E1 5G Impact

The entire industry is working hard to address the 5G rollout. Everyone from the analog to digital converter specialists to the DSP and FPGA vendors are targeting new performance requirements from the new spectrum coming online. Beyond that, more connected devices with more bandwidth create the need to process data further out closer to endpoints. This is the industry the Arm Neoverse E1 is targeted at.

Arm Neoverse E1 Architecture

Like the N1, the Arm Neoverse E1 is an Armv8.2 compliant core. Instead of going for higher clock speeds, it is designed to run at low power and be deployed in clusters of cores.

Arm Neoverse Tech Day 2019 Neoverse E1 Design Goals
Arm Neoverse Tech Day 2019 Neoverse E1 Design Goals

Here, the Arm Neoverse E1 has small out of order cores with SMT capabilities to ensure the cores can keep throughput high. For comparison, the ThunderX (1) design was an in-order design, as were older Atom chips like the Intel Atom S1260 Centerton.

Arm Neoverse Tech Day 2019 Neoverse E1 Throughput Optimization
Arm Neoverse Tech Day 2019 Neoverse E1 Throughput Optimization

Deployed commonly in clusters of up to 8 CPUs, the Arm Neoverse cores share components to maximize space and power efficiency.

Arm Neoverse Tech Day 2019 Neoverse E1 Clusters
Arm Neoverse Tech Day 2019 Neoverse E1 Clusters

The pipeline is only 10 stages on and you will notice that it is a lot less complex than the Arm Neoverse N1 pipeline.

Arm Neoverse Tech Day 2019 Neoverse E1 Pipeline
Arm Neoverse Tech Day 2019 Neoverse E1 Pipeline

SMT is added. That was a major feature that the Broadcom Vulcan incarnated in the ThunderX2 added. The ThunderX2 utilized 4-way SMT to make 32 core CPUs have 128 threads. With the Neoverse E1, we have 2-way SMT so each core will look like two separate CPUs.

Also features like the 128-bit load store was only half that on previous generations and the non-blocking pipeline helps performance as well.

Arm Neoverse Tech Day 2019 Neoverse N1 Multi Threading
Arm Neoverse Tech Day 2019 Neoverse N1 Multi-Threading

The instruction cache is likely to be smaller on the Neoverse E1 designs to save on silicon space. Arm is again betting that heavy branch prediction will be able to keep its cache and cores fed.

Arm Neoverse Tech Day 2019 Fetch Decode Rename
Arm Neoverse Tech Day 2019 Fetch Decode Rename

Arm is using an out of order architecture here which is designed to reduce stalls in the CPU and maintain throughput. Frankly, Arm needed to use an OoO model for this type of core.

Arm Neoverse Tech Day 2019 Neoverse E1 OoO Execution
Arm Neoverse Tech Day 2019 Neoverse E1 OoO Execution

Like with the Neoverse Cortex N1, the E1 features faster execution than its predecessors. We heard that the Neoverse E1 is technically an Armv8.2 architecture, but some features were pulled in from Armv8.3.

Arm Neoverse Tech Day 2019 Neoverse E1 NEON And INT FP
Arm Neoverse Tech Day 2019 Neoverse E1 NEON And INT FP

We are told that most vendors will use 1MB or L3 cache per cluster but the design supports up to 4MB. The Neoverse E1 is able to support up to 16 outstanding transactions to help keep the cores fed. We were told memory speedups of 4.9x A53 to E1 and 2.2x A55 to E1.

Arm Neoverse Tech Day 2019 Neoverse E1 Memory
Arm Neoverse Tech Day 2019 Neoverse E1 Memory

If the Neoverse N1 core was designed for 1-1.8W of power per core, the Neoverse E1 is both smaller and designed to run at 183mW per core or 5-10x lower. Clock speeds are lower as well, targeting 2.5GHz or lower as optimal.

Arm Neoverse Tech Day 2019 PPA
Arm Neoverse Tech Day 2019 PPA

To put this all in perspective, Arm essentially made an architectural leap akin to when Intel went from the dual-core Intel Atom S1260 “Centerton” to the Atom C2000 Avoton parts. That delta was enormous in September 2013 and from what Arm is showing, the Neoverse E1 has the potential to do the same if its customers choose to make chips accordingly.

Arm Neoverse E1 Impact and Target Market

Arm is using a low power core in the Neoverse E1 instead of the N1 because it sees a developing market at the edge. With the explosion of endpoints pushing data back to the network, Arm sees a need to put intelligence to either act on that data closer to the device for lower latency or to filter the data going back to the cloud. If the Arm Neoverse N1 messaging was to make a CPU for hyperscale cloud vendors, in a way, the Arm Neoverse E1 is the chip to make sure that data from endpoints do not need to make it back to the cloud. Arm wants to make CPUs for the entire value chain, but that is an interesting, albeit valid, position to take.

Arm Neoverse Tech Day 2019 Neoverse E1 Scale
Arm Neoverse Tech Day 2019 Neoverse E1 Scale

We did not see the Neoverse E1 edge platform reference design at the Tech Day as we did with the Neoverse N1 platform. Arm told us that it was expecting sub 15W SoCs with less than 4W dedicated to 16x Neoverse E1 CPUs providing 32 threads. That is intriguing because Arm was quoting 0.183W per core which one would expect is under 3W, but there are other aspects missing between the earlier figure and a fully functioning CPU.

Arm Neoverse Tech Day 2019 Neoverse E1 Edge Reference Design 8C
Arm Neoverse Tech Day 2019 Neoverse E1 Edge Reference Design 8C

One of the key points is that Arm is pushing its Server Based System Architecture. That is a push for standards that many of the edge computing devices need to help make the overall ecosystem flourish, not just the ecosystem for one chipmaker.

Arm Neoverse Tech Day 2019 Neoverse E1 Small Cell Example
Arm Neoverse Tech Day 2019 Neoverse E1 Small Cell Example

Arm sees 25GbE devices that can sit atop lamp posts and serve as 5G endpoints. That means the Neoverse E1 reference platform is being targeted at very power constrained requirement sets that also must achieve a high throughput.

Arm Neoverse Tech Day 2019 Throughput Marketing
Arm Neoverse Tech Day 2019 Throughput Marketing

Beyond just the Arm Neoverse E1 cores, the company expects its chipmaker customers to integrated other hardware accelerators with the E1 cores to provide greater efficiency via hardware offload. Small cores allow Arm to provide the OS handling and logic while accelerators can handle the tasks they are best at.

Arm Neoverse Tech Day 2019 Neoverse E1 5G Accelerator
Arm Neoverse Tech Day 2019 Neoverse E1 5G Accelerator

Now that 25GbE and 100GbE are mature and have become the defacto standard for what organizations are striving to deploy today, Arm is also looking at what it would take to go beyond 100GbE standards. That can be a mesh including both Neoverse E1 and Neoverse N1 cores alongside dedicated hardware.

Arm Neoverse Tech Day 2019 Neoverse E1 Beyond 100Gbps
Arm Neoverse Tech Day 2019 Neoverse E1 Beyond 100Gbps

Arm sees benefits to leveraging low power standardized cores. We frankly wish that all of the lower-end development boards adopted Neoverse E1 CPUs to get everything standardized as soon as possible.

Arm Neoverse Tech Day 2019 Neoverse E1 Platform Ecosystem
Arm Neoverse Tech Day 2019 Neoverse E1 Platform Ecosystem

If one takes a step back, the Arm Neoverse E1 is exactly the type of cores that we will need in quantity going forward. Believing in an explosion of data from sensors like surveillance cameras, intelligence at the edge must increase to avoid unnecessary data movement and storage. Chipmakers have an additional incentive here. Deploying Arm Noverse E1 SBSA compliant cores in your devices will help make them more standardized which will in turn help the Arm ecosystem grow.

Next, we are going to end with our market perspective and final thoughts on the platform.

7 COMMENTS

  1. This was a great long read.

    STH is now like a mix of the technical side of Anandtech, the business side of TNP, and adding in it’s own mix of hands on experience working with this hw. I can’t wait for your N1 review

  2. Amazing article!
    Arm is set to dominate the EDGE, I don’t really see how Intel hopes to gain any market share with the power draw of the x86 ecosystem. Given how much money they can out on R&D, we should expect to see something from them … and the Big.Little using Atom little cores doesn’t sound the right approach

  3. Risky89 – Arm Neoverse N1 CPUs will be coming out in a few quarters. The development board with the Neoverse N1 CPU is a low production unit that is primarily going to companies that are building chips.

LEAVE A REPLY

Please enter your comment!
Please enter your name here
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.