Since its release in 2017, the NVIDIA Tesla V100 has been the industry reference point for accelerator performance. Intel compares its CPUs and AI accelerators to Tesla V100 GPUs for HPC workloads. Every AI company measures its performance to the Tesla V100. Today, that measuring stick changes, and takes a big leap with the NVIDIA A100. Note the cards are actually labeled as GA100 but we are using A100 to align with NVIDIA’s marketing materials.
NVIDIA A100 Key Specs
Let us start with the A100 key specs:
You may ask yourself, what are the asterisks next to many of those numbers, NVIDIA says those are the numbers with structural sparsity enabled. We are going to discuss more of that in a bit. Here is the peak v. measured for Tesla V100 v. A100 that NVIDIA is showing.
Let us dive into some of these details to see why they are impactful.
NVIDIA A100 (SXM) Details
NVIDIA is inventing new math formats and adding Tensor Core acceleration to many of these. Part of the story of the NVIDIA A100’s evolution from the Tesla P100 and Tesla V100 is that it is designed to handle BFLOAT16, TF32, and other new computation formats. This is exceedingly important because it is how NVIDIA is getting claims of 10-20x the performance of previous generations. At the same time, raw FP64 (non-Tensor Core) performance, for example, has gone from 5.3 TFLOPS with the Tesla P100, 7.5 TFLOPS for the SXM2 Tesla V100 (a bit more in the SXM3 versions), and up to 9.7 TFLOPS in the A100. While traditional FP64 double precision is increasing, the accelerators and new formats are on a different curve.
Both BFLOAT16 and TF32 are numerical formats that retain acceptable accuracy while greatly reducing the bits that are used. Lower bits in each number mean less data needs to be moved and faster calculations. Those new formats are augmented by Tensor Cores and structural sparsity. With structural sparsity, the A100 can take advantage of sparse models and avoid doing unnecessary computations. You can read the presentation on structural sparsity here.
Memory has gone from 16GB of HBM2 on the Tesla P100 to 16GB and 32GB on the Tesla V100 and now 40GB. That is a jump, but a more measured jump. A few folks reached out asking why it looks like the A100 package has six Samsung HBM2 stacks but the total is not divisible by 6. Perhaps NVIDIA is planning a 48GB version in the future or there is something else going on. We will update if more details emerge than we have while writing this (before the keynote is live.) Memory bandwidth is up to 1.6TB/s.
As a quick note, on the pre-keynote briefing, Jensen Huang, CEO of NVIDIA hinted that there is a reason NVIDIA does not use HBM2 on its consumer GPUs. He did not say it, but it is likely cost. As a result, we would not expect HBM2 in the next GeForce GPUs.
For die size, the 54B transistor GPU measures 826mm2 and is built on TSMC 7nm. This is a big process shrink that helps NVIDIA pack in much higher densities for more compute. Even with the 7nm shrink, power is going up, now rated at 400W. In our recent Inspur NF5488M5 Review A Unique 8x NVIDIA Tesla V100 Server we noted that “Volta-Next” (e.g. A100) support would be at 400W. Currently, the DGX-2h rates its Tesla V100’s at 450W so our sense is that 400W is not the highest level we will see. Also, this is for the SXM variants. PCIe cards will still be limited by thermals so we expect lower power there.
NVLink speeds have doubled to 600GB/s from 300GB/s. We figured this was the case recently in NVIDIA A100 HGX-2 Edition Shows Updated Specs. That observation seems to be confirmed along with the PCIe Gen4 observation.
The A100 now utilizes PCIe Gen4. That is actually a big deal. With Intel’s major delays of the Ice Lake Xeon platform that will include PCIe Gen4, NVIDIA was forced to move to the AMD EPYC 64-core PCIe Gen4 capable chips for its flagship DGX A100 solution. While Intel is decisively going after NVIDIA with its Xe HPC GPU and Habana Labs acquisition, AMD is a GPU competitor today. Still, NVIDIA had to move to the AMD solution to get PCIe Gen4. NVIDIA’s partners will also likely look to the AMD EPYC 7002 Series to get PCIe Gen4 capable CPUs paired to the latest NVIDIA GPUs.
NVIDIA wanted to stay x86 rather than go to POWER for Gen4 support. The other option would have been to utilize an Ampere Altra Q80-30 or similar as part of NVIDIA CUDA on Arm. It seems like NVIDIA does not have enough faith in Arm server CPUs to move its flagship DGX platform to Arm today. This may well happen in future generations so it does not need to design-in a competitor’s solution.
Multi-Instance GPUs (MIG) are a feature that when you hear about it makes a lot of sense. With this feature, NVIDIA can take an A100 and split it up into as many as seven partitions. Each MIG gets its own dedicated compute resources, caches, memory and memory bandwidth. For cloud providers, they have less incentive to use different GPUs and more incentive to deploy the A100 and partition it into smaller chunks. While the compute resources will go up significantly, the memory available to MIGs will likely be less than the 32GB on a Tesla V100.
What Is the Supply Situation?
I was able to ask Jensen a question directly on the obvious question: supply. Starting today onward, a Tesla V100 for anything that can be accelerated with Tensor Cores on the A100 is a tough sell. As a result, the industry will want the A100. I asked how will NVIDIA prioritize which customers get the supply of new GPUs. Jensen said that the A100 is already in mass production. Cloud customers already have the A100 and that it will be in every major cloud. There are already customers who have the A100. Systems vendors can take the HGX A100 platform and deliver solutions around it. The DGX A100 is available for order today. That is a fairly typical data center launch today where some customers already are deploying before the launch. Still, our sense is that there will be lead times as organizations rush to get the new GPU for hungry AI workloads.
With the first round of GPUs, we are hearing that NVIDIA is focused on the 8x GPU HGX and 4x GPU boards to sell in its own and partner systems. NVIDIA is not just selling these initial A100’s as single PCIe GPUs. Instead, NVIDIA is selling them as pre-assembled GPU and PCB assemblies.
The NVIDIA A100 is a first step for the new GPU. When one looks at the Tesla V100 there are a number of options including:
- Single-slot PCIe 150W 16GB version
- Dual-slot PCIe 16GB
- Dual-slot PCIe 32GB
- Dual-slot PCIe 32GB (V100S)
- SXM3 (350W)
- SXM3 (450W) for the DGX-2h
With at least seven different variants that use the “Tesla V100” label, NVIDIA has some market segments to fill. We expect some segments to perhaps go away this cycle but also new ones such as the EGX A100 we are covering today to be introduced. In the pre-brief for today’s announcement, NVIDIA offered that Ampere inferencing performance is so high that it will supplant Tesla T4’s. While this is likely to happen, at 400W the current iteration will not fit in edge use cases that need lower power envelopes. We look forward to seeing the A100 line fill-out over the coming quarters.
You can check out our piece on how these GPUs are used in our NVIDIA DGX A100 Leapfrogs Previous-Gen piece.
Note: We got late word that this is now the NVIDIA A100 without the “Tesla” branding.