State of the Arm Neoverse N1 and Testbed
What you are seeing here is actual Arm Neoverse N1 silicon. This is in the Arm Neoverse SoC Dawn Ares Platform. Much of what Arm discussed was done in the form of RTL implementations and models, but there are Arm Neoverse N1 cores in the wild.
We are going to discuss the platform, then talk about some of the performance figures we got for the Arm Neoverse N1.
Arm Neoverse N1 Dawn Ares Test Platform
Arm Neoverse N1 test platform is a relatively compact kit. The purpose of the platform is not to have places like STH benchmark (although we want to), it is instead a platform for those looking to build CPUs atop Neoverse N1 or do other ecosystem enablements. For example, if Microsoft wanted to build a Neoverse N1 64 core chip, this is the platform they may use for testing prior to getting their silicon back.
The actual Arm Neoverse N1 SoC has two MP2 N1 CPUs with 1MB L2 cache per core. There is also an 8MB system level cache and two DDR4-3200 memory controllers. When you look at this SDP, one can see that Arm is trying to give a user all of the parts necessary to simulate functions of a larger system.
Here is a shot from above. One can see the main Arm Neoverse N1 SoC. One can also see a Xilinx FPGA. On the x86 side, we are accustomed to having I/O directly on the SoC or northbridge. On Arm development platforms and even RISC-V platforms, we are seeing that offloaded to FPGAs.
Here is the official labeled overview. One will notice that the mATX form factor motherboard is relatively standard for a SDP. We have certainly seen some exotic SDPs in the past. One great example was when STH got an exclusive during the Intel Xeon D launch via a Beverly Cove SDP.
There are three PCIe 3.0 slots, a x16, x8 and x1 electrical. Beyond that, there is the fourth slot labeled on the diagram and also the PCIe slot as PCIe Gen 4.0 CCIX. We thought this is very interesting because putting the CCIX/ PCIe Gen 4.0 slot on the far edge means that the Gen 4.0 traces are the longest, yet furthest from the SoC.
We covered that Xilinx has CCIX enabled FPGAs, we have seen the Huawei Kunpeng 920 64-Core Arm Server CPU with CCIX and PCIe Gen4 launched. Now there is an Arm Neoverse development platform with it enabled. That is clearly a big point Arm is pushing with this SDP as CCIX opens a world of opportunity for Arm Neoverse.
Since this is STH, even the rear I/O panel gets a glory shot for the SDP.
This is not meant to be a production board. Instead, this is designed for Arm’s primary customers and some ecosystem enablement. When we were at the Tech Day 2019, we were told that low single-digit dozens of these boards had been produced. Frankly, if you are an end-user, a low core count, two DDR4 DIMM solution is not what you want to deploy. If you work at Ampere, you are likely to have engineers fight over these platforms.
Along with the hardware platform, Arm is delivering a software stack to get users up and running.
Again, Arm has a series of software development tools to help vendors and those in the ecosystem use the SDP.
Now that we have shown architectural details and physical silicon, we wanted to discuss performance.
Arm Neoverse N1 Performance
Personally, I think that giving the Arm Neoverse N1 performance talk in Q1 2019 is perhaps one of the hardest jobs. For some perspective here, Arm makes the cores and some surrounding IP. Arm does not license an entire chip. To make a complete chip, one needs IP blocks from other vendors. Once a set of core IP from Arm, 3rd party vendors, and the chip’s designer are integrated, one gets a solution ready to go to the foundry. Once chips are produced, one can benchmark actual chips. We are early in the Neoverse N1 lifecycle so having performance numbers to stand by is quite an accomplishment.
Arm says that it is doing a lot of hardware and software development. If you remember the STH Software Maturity Model, one can see significant gains just through optimizing code.
Arm believes that it is going to see a massive integer and floating point performance uplift with the Neoverse N1 over the Cortex-A72 based on estimating performance in RTL and using silicon.
One of the key themes is that Arm gets benefits both from hardware as well as software. We have consistently seen new toolchains and kernels help Arm server performance, so this makes sense. Also, NVIDIA has been heavily promoting GPUs as seeing massive gains by counting both software and hardware gains over time. Arm is using that same methodology.
Other examples the company showed are with virtualization, showing faster KVM restore times. Here, the time scale is notably absent.
Nginx is perhaps the world’s most prominent web server these days. STH has used it for years. It is also a very popular application for web servers. Here, Arm showing a 2x-2.5x performance gain for Neoverse N1.
Likewise, Arm showed some example speedups on operations for Arm Cortex-A72 versus Neoverse N1.
Arm did a solid job showing anticipated speedups. We think some of these figures will shift once chip designers add their own IP to the mix, but it seems like the improvements made in Arm Neoverse N1 should offer significant performance improvements. For those who want to see Arm v. Intel Xeon and AMD EPYC, check out our Cavium ThunderX2 Review and Benchmarks a Real Arm Server Option. While ThunderX2 is not using Neoverse N1, that is all of the publicly available silicon at the time of this writing. Cascade Lake early shipment has been underway for some time, but the formal launch is still forthcoming.
The Arm Neoverse N1 was perhaps the star of the show, but during the Arm Neoverse Tech Day 2019, the company also showed off its Neoverse E1 architecture for lower power edge applications.