Today Oak Ridge National Laboratory (ORNL) and the US Department of Energy (US DoE) are making a big announcement. In 2021 we will see the Frontier supercomputer reaching levels of performance that are well beyond anything we have today. Frontier is set to be a 1.5 exaflop supercomputer. For some context, the Summit supercomputer, today’s fastest, is around 0.2 exaflops. While Summit was based on IBM Power and NVIDIA, Frontier will go in a different direction. At today’s announcement, the company says that the new Cray supercomputer will be powered by a future AMD EPYC generation as well as a future AMD Radeon Instinct. STH attended a pre-briefing call on the new supercomputer.
Cray and AMD Power Frontier Supercomputer
The first major announcement is that the new US DoE supercomputer housed at ORNL will be powered by Cray. Cray said that it will be using a future revision of its Shasta platform. At STH, we saw the Cray Shasta blade based on AMD EPYC at SC18. This particular platform we are told is for AMD EPYC “Rome” CPUs. This is very interesting since we recently covered that Cray Confirms Intel Xeon Platinum 9200 Support. It seems that the promise of such as setup, and more accurately such a setup as it will evolve in two years or so, was not enough to win this contract.
Cray had some staggering figures for the new machine, ORNL’s third generation of supercomputing architecture.
- Frontier will achieve 1.5 exaflops in compute power
- It will be composed of 100 Shasta supercomputer cabinets
- Each cabinet is rated for 300kW
- Cray’s Slingshot interconnect will be used
- The total system size is over 7300 square feet (more than two basketball courts) and will weigh over 1 million pounds
- Frontier will use around 90 miles of cables
- The system build contract is for $500M
- There is another approximately $100M development contract for the programming environment
There is a lot to break down there before we get to the AMD portion.
First, 100 cabinets * 300kW yields 30000kW or 30MW and there are likely other power costs not included in there.
Cray’s new Slingshot interconnect is fascinating. While InfiniBand and Omni-Path have become popular in the HPC space, Slingshot is able to handle low latency adaptive routing for different traffic on the fabric while maintaining compatibility with Ethernet.
Shasta and Slingshot are a big deal. If you believe that enterprises are about to boom, and cloud providers will want to offer HPC systems, Slingshot compatibility with Ethernet means that it aligns more closely to public/ private cloud standards rather than more niche interconnects. Cray stated its goal to deliver the technology in Frontier to organizations starting in a single 19″ rack. The company says the architecture is purpose-built for Exascale, but it is clear Cray has HPC and AI ambitions outside of Frontier.
$600M (or more) for the new system is also a big figure. When it is complete, it will likely be the most costly system in existence and is designed to do HPC and AI workloads at a scale that has never been done before. This is a big check being written to do big science.
New AMD Technology in 2021 Powering Frontier
During the pre-brief, Dr. Lisa Su, AMD’s CEO outlined a number of advancements AMD will make for the 2021 system. First, the CPUs will be a future generation AMD EPYC product. Second, the GPUs will be a future Radeon Instinct product. Neither of these two announcements is completely surprising. What they do show in the context of Frontier, is that AMD has convinced Cray and the US DoE that in 2021 it will have the CPU and GPU platforms that are worth investment. With a $100M development contract and the fact that the largest supercomputer in the world will be using AMD in two years, this will do a lot to bolster AMD’s market perception and support.
The actual architecture we are told will involve a single AMD EPYC CPU mated to four AMD Radeon Instinct GPUs. On the pre-brief, we were told the ratio, not the topology, but were told that a future, coherent Infinity Fabric will tie CPUs and GPUs together. If you saw our recent Gen-Z in Dell EMC PowerEdge MX and CXL Implications article and were wondering if coherency is going to be a big topic in the next few years, this is further proof that it will be.
The CPUs themselves AMD says are based on a future “Zen” core design. This future AMD EPYC processor we were told will be customized and optimized for HPC and AI. We were told that these future EPYC CPUs will add additional instructions and have an optimized architecture for AI and supercomputing.
For the US DoE, Frontier will be a big system that will enable big science. Beyond traditional HPC, the focus on bringing cloud-like management and AI capabilities to the system gives the broader ecosystem direction. For Cray, this is probably the company’s largest single contract ever, or at least publicly disclosed contract. Shasta and Frontier can be the way that Cray pushes into the higher-end enterprise market. For AMD, this is a great proof point and validation of the company’s strategy on both the CPU and GPU side. Perhaps the most intriguing aspect of Frontier and the technologies that comprise the system is that it is not a theoretical exercise in the future. Instead, it is something we could see in the next major hardware cycle or two.
Here is the live stream announcement recording: