At SC20 the Top500 November 2020 edition was released. Twice per year, a new Top500.org list comes out essentially showing the best publicly discussed Linpack clusters. We take these lists and focus on a specific segment: the new systems. To do this, we simply look at the data for the new systems where it is their first time on the Top500 list. A 2020 trend clearly emerged. The number of new systems on the June 2020 Top500 list (#55) was only 58. That is a vastly lower number than the June 2019 (94) and November 2019 (102) lists. In November 2020, we have only 44 systems that are being noted as this being their first time on the list. Instead of looking at all 500 systems, or even just the top 10, we are going to focus on these 44 systems. Our analysis may seem a bit underwhelming, but that is because we have had fewer systems enter the Top500 combined on the two 2020 lists than in just November 2019 alone.
Having a busy day? We have a video version of this discussion on the STH YouTube channel:
As always we suggest opening this in a YouTube tab and listening along as it is a better viewing experience than the embed here. We have never done a video on this now 3-year old series, so we are trying it out.
Top500 New System CPU Architecture Trends
In this section, we simply look at CPU architecture trends by looking at what new systems enter the Top500 and the CPUs that they use. Let us start by looking at the vendor breakdown.
In June 2020, while Arm took the top spot, and the A64fx had another new system on the list. Here we have one new system for Fujitsu and two for NEC. IBM Power did not have a first-time system on this list. x86 was installed on 41 of the 44 new systems. When we break this down by architecture:
This is extremely interesting. On this list, we see a “resurgence” of 2016-era CPUs with a single Intel Xeon E5 V4 “Broadwell” system. Something that is very interesting here is that we also have 6 of Intel’s 29 systems using first-generation Intel Xeon Scalable processors codenamed Skylake. That means a total of seven “new” systems are using older generations of processors, and Cascade Lake is not feeling modern at this point. In contrast, AMD did not have a new EPYC 7001 system but did have 12 AMD EPYC 7002 systems.
When we remove pre-2018 x86 “new” systems as we did in previous versions of this analysis, we get a different picture:
Intel still has a commanding share here, but if we look at current architectures that share is slightly more muted. In the June 2020 list Intel has around 76% share with the older x86 CPUs removed. In the November 2020 list, Intel is down to 59% in this metric. AMD has moved up to 27% share overall but 32% share when removing legacy chips. As we noted in previous analysis pieces, we will track in the future if this delta continues to increase if customers continue to use older generations of non-Intel CPUs.
CPU Cores Per Socket
Here is an intriguing chart, looking at the new systems and the number of cores they have per socket.
Previously 20-24 cores were the most popular options by a wide margin. Now the 64-core parts are over 20% of the new systems (9 new systems.) For some perspective, in June 2020 there were 8 new systems with 32 cores and greater on the list and 0% before that. Achieving over 20% is a big move in the industry. Here is a list of all the CPUs added to the list:
We do see a single Intel Xeon Platinum 9242 system, but not the higher-end Intel Xeon Platinum 9282 56-core part. Interestingly the nine 64-core parts were all AMD EPYC 7002 series and none were the AMD EPYC 7H12 we reviewed.
Accelerators or Just NVIDIA?
As with the June 2019 list, NVIDIA is the only accelerator vendor for the new systems. Here is a breakdown in our accelerator by vendor chart:
In June 2020 28 of the 58 new systems used NVIDIA accelerators. Now that is down to 14 of 44. Here is a breakdown of the new accelerated systems by accelerator:
As one can see, the NVIDIA V100 is still the most common accelerator. While the big NVIDIA A100 internal development cluster, the NVIDIA Selene grew, there are five other new systems on this list with the A100. Perhaps teams were waiting for the new NVIDIA A100 80GB model.
Acceleration is still a NVIDIA game, but with Exascale systems coming soon, and we know about AMD with Frontier and El Capitan 2 along with Intel Xe HPC GPUs for that era, we may see a change over the next few lists as high-end systems get more diverse with accelerators. It is likely that the AMD Instinct MI100, launched today, was not available for systems built and benchmarked for this list. Perhaps this NVIDIA-centric view is why the DoE is investing in software tooling to make AMD and Intel GPUs viable since NVIDIA effectively owns the HPC acceleration market with the caveat that CPU-only is the primary alternative.
Fabric and Networking Trends
A really interesting change in this list is the reversal of a trend toward more Ethernet systems being added to the list in Top500 systems.
In June 2020 we saw Ethernet at 53% of the new systems. In this list, Ethernet is only 27% even with HPE Cray Slingshot falling in this category.
It is somewhat surprising here that OPA remains so popular. Many have known Intel’s current interconnect was being sunset since SC18, and the product line officially has no roadmap beyond 100Gbps at Intel (it has been taken over by Cornelis Networks.) Still, we see just under 7% of the new systems with OPA100. It is losing share, but slower than we expected.
When we look at a breakdown by generation, here is what we get:
Again notably, 40GbE is not present in the new systems on this while it has been used on many previous Top500 entries even as late as November 2019. 10GbE/ 25GbE has also seen a massive drop-off in this list.
If we drill into which manufacturers are using 10GbE, 25GbE, and 40GbE to be consistent with our previous analysis, here is what we get:
As one can see, Lenovo maintains 10GbE supremacy. Lenovo’s 10GbE systems are often cloud provider installations that are benchmarked with Linpack and added to this list. Many of these systems are installed at “Service Provider T” in China and do not use accelerators. Two of Inspur’s systems are the Inspur Systems NF5468M5 we reviewed with 8x NVIDIA V100 GPUs.
When we look at the vendor picture, we can get a sense of what is happening in the market:
Atos is clearly leading here. HPE, Dell EMC, and Inspur have some solid systems. All of Lenovo’s systems are from service providers or financial services providers in China with five of these systems still being based on 2017-era Intel Skylake with no GPU acceleration.
Clearly, we can see the impact 2020 is having on the HPC industry with only 44 new systems on this list or about 43% of the year-ago volume. Perhaps the good news is that 2021 promises some big systems and some architecture diversification which are good trends to keep the industry healthy. As always, check the Top500.org website for the full list and all lists as they come out every June and November.
Some statistics on how the total compute power is distributed between vendors and architectures would be interesting as well, maybe more so than the amount of systems for each.