Cerebras WSE-3 AI Chip Launched 56x Larger than NVIDIA H100

4
Cerebras WSE 3 Wafer
Cerebras WSE 3 Wafer

The Cerebras Wafer Scale Engine is by far one of the coolest chips, and one of the only commercially successful AI chip offerings from a startup. We now have a new third version of the chip called the Cerebras WSE-3. The new chip leverages another process shrink to achieve even more density than previous generations.

Cerebras WSE-3 AI Chip Launched 56x Larger than NVIDIA H100

This is the world’s largest AI chip with over 4 trillion transistors and 46225mm2 of silicon in TSMC 5nm. For those who have not seen our previous WSE-2 coverage, this chip is the largest square that TSMC can make.

Cerebras WSE 3 Wafer Scale Engine 3 Specs
Cerebras WSE 3 Wafer Scale Engine 3 Specs

Onboard are 900,000 cores and 44GB of memory. When Cerebras says memory, this is more of SRAM rather than off-die HBM3E or DDR5. The memory is distributed alongside the cores with the goal of keeping data and compute as close as possible.

For a sense of scale, this is the WSE-3 next to he NVIDIA H100.

Cerebras WSE 3 Vs NVIDIA H100
Cerebras WSE 3 Vs NVIDIA H100

One of the reasons Cerebras has been successful, aside from bringing this giant chip to market, is that it is doing something different than NVIDIA. While companies like NVIDIA, AMD, Intel, and others take a big TSMC wafer and slice it into smaller segments to make their chips, Cerebras keeps the wafer together. In today’s clusters, where there can be tens of thousands of GPUs or AI accelerators working on a problem, reducing the number of chips by a factor of 50+ makes the interconnect and networking costs and power consumption lower. In a NVIDIA GPU cluster with Infiniband, Ethernet, PCIe, and NVLink switches a huge amount of power and cost is spent on re-linking chips. Cerebras cuts that out by just keeping the entire chip together.

Here I am with the original WSE and Andrew Feldman, Cerebras’ CEO and a NVIDIA Tesla V100 in my other hand just before the pandemic.

Patrick And Andrew Feldman With Cerebras WSE
Patrick And Andrew Feldman With Cerebras WSE

Here is the side-by-side. One item to note is, again, Cerebras uses on-die memory, not on-package memory for NVIDIA, so we do not get the 80GB of HBM3 on the H100 as an example.

Cerebras WSE 3 Vs NVIDIA H100 Specs
Cerebras WSE 3 Vs NVIDIA H100 Specs

Of course, having a chip is nice, but it needs to fit into a system. That system is the CS-3.

Cerebras CS-3 Wafer Scale System

The Cerebras CS-3 is the third-generation Wafer Scale system. This has the MTP/MPO fiber connectivity at the top, along with power supplies, fans, and redundant pumps for cooling.

Cerebras CS 3 Front Door Open
Cerebras CS 3 Front Door Open

Cerebras need to get power, data, and cooling to the giant chip while also managing things like thermal expansion across a relatively huge area. This is another big engineering win for the company. Inside, the chip is liquid-cooled, and the heat can either be removed via fans or via facility water. We showed how this worked in our Cerebras CS-2 Engine Block Bare on the SC22 Show Floor piece. Here is a video featuring this.

This system and its new chip is a roughly 2x performance leap at the same power and price. Cerebras gets a big advantage from each process step starting with 16nm in the first generation to 5nm today.

Cerebras CS 3 For WSE 3
Cerebras CS 3 For WSE 3

Compared to the NVIDIA DGX H100 system with eight NVIDIA H100 GPUs and internal NVSwitch and PCIe switches it is just a bigger building block.

Cerebras CS 3 Vs NVIDIA DGX H100
Cerebras CS 3 Vs NVIDIA DGX H100

Here is a CS-3 with Supermicro 1U servers.

Cerebras CS 3 And Supermicro NVMe Servers
Cerebras CS 3 And Supermicro NVMe Servers

Here is another Cerebras cluster shot with Supermciro 1U servers. Cerebras generally uses AMD EPYC for higher core counts, but perhaps also because a lot of the Cerebras team came from SeaMicro, which was acquired by AMD.

Cerebras Cluster With Supermicro NVMe Servers
Cerebras Cluster With Supermicro NVMe Servers

Something we noticed in this iteration is that Cerebras also has a solution with HPE servers. It is a bit strange since, generally, the Supermicro BigTwin is a step ahead of HPE’s 2U 4-node offerings.

Cerebras CS 3 Systems And HPE At Colovore Data Center
Cerebras CS 3 Systems And HPE At Colovore Data Center

One way to think of the Cerebras CS-2/ CS-3 is that they are giant compute machines, but a lot of the data pre-processing, cluster-level tasks, and so forth happens on traditional x86 compute to feed the optimized AI chip.

Cerebras Condor Galaxy 2 HPE Servers Vertiv Racks
Cerebras Condor Galaxy 2 HPE Servers Vertiv Racks

Since this is in a liquid-cooled data center, the air-cooled HPE servers have a rear door heat exchanger setup from ColdLogik, a subbrand of Legrand. If you want to learn more about rear door heat exchangers compared to other liquid cooling options, you can see our Liquid Cooling Next-Gen Servers Getting Hands-on with 3 Options from 2021.

This is a great example of how Cerebras could utilize a liquid-cooled facility, but it did not have to bring cold plates to each server node.

One of the big features of this generation is even larger clusters, up to 2048 CS-3’s for up to 256 exaFLOPs of AI compute.

Cerebras CS 3 For 2048 Node 256EF Cluster
Cerebras CS 3 For 2048 Node 256EF Cluster

That 12PB memory figure is a top-end hyper-scale SKU designed for training GPT-5 size models quickly. Cerebras can also scale down to something like a single CS-2 with supporting servers and networking.

Cerebras CS 3 Clusters For GPT 5 Scale
Cerebras CS 3 Clusters For GPT 5 Scale

Part of the memory is not just the on-chip memory (44GB) but also the memory in the supporting servers.

Cerebras CS 3 Cluster Large Memory
Cerebras CS 3 Cluster Large Memory

As a result, the Cerebras cluster can train much larger models than before.

Cerebras CS 3 Faster With Larger Clusters
Cerebras CS 3 Faster With Larger Clusters

Earlier, we said that Cerebras was a commercial success, and there was an update on that as well.

Cerebras Condor Galaxy Update

A few months ago, we covered how the first $100M+ Cerebras AI Cluster makes it the Post-Legacy Silicon AI Winner. We also discussed the Giant Cerebras Wafer-Scale Cluster in more detail. Not only did the Condor Galaxy 1 get an announcement, but it is completed and is training for customers.

Cerebras Condor Galaxy 1 Complete
Cerebras Condor Galaxy 1 Complete

Cerebras did not just announce one, there is another cluster, the Condor Galaxy 2 that is now up and running for G42.

Cerebras Condor Galaxy 2 Complete
Cerebras Condor Galaxy 2 Complete

The new Condor Galaxy 3 is the Dallas (good luck with Texas property tax on these!) cluster that will use the new 5nm WSE-3 and CS-3 for compute.

Cerebras Condor Galaxy 3 Underway
Cerebras Condor Galaxy 3 Underway

These are the current US-based clusters in Santa Clara, Stockton, and Dallas, but the plan is to build at least six more.

Cerebras CG 2 And CG 3
Cerebras CG 2 And CG 3

The total value of these clusters should be north of $1B and be completed in 2024. Aside from the $1B in deal value, Cerebras told us that they are currently supply-limited, so the demand is there for the WSE-3.

Cerebras Condor Galaxy 1 Phase 4
Cerebras Condor Galaxy 1 Phase 4

Since we know customers are using these, the question is how hard is it to use?

Cerebras Programming for AI

You may have seen that Cerebras says its platform is easier to use than NVIDIA’s. One big reason for this is how Cerebras stores weights and activations, and it does not have to scale to multiple GPUs in a system and then multiple GPU servers across a cluster.

Cerebras CS 3 Code For GPT 175B Model
Cerebras CS 3 Code For GPT 175B Model

Aside from the code changes being easy, Cerebras says that it can train faster than a Meta GPU cluster. Of course, this seems to be a theoretical Cerebras CS-3 cluster at this point since we did not hear of any 2048 CS-3 clusters up and running while Meta has AI GPU clusters already.

Llama 70B Meta Vs Cerebras CS 3 Cluster
Llama 70B Meta Vs Cerebras CS 3 Cluster

Overall, there is a lot going on here, but one thing we know is that more people are using NVIDIA-based clusters today.

Cerebras and… Qualcomm?

While Cerebras is focused on training in inference, it announced a partnership with Qualcomm to use Qualcomm’s legacy AI inference accelerators. We first covered the Qualcomm Cloud AI 100 AI Inference Card at Hot Chips 33 in 2021.

Cerebras CS 3 And Qualcomm Cloud AI 100 Ultra
Cerebras CS 3 And Qualcomm Cloud AI 100 Ultra

The idea is to have a GPU-free training and inference platform, but it just left us wondering if Qualcomm invested in Cerebras. It felt like that kind of add-on announcement.

Final Words

The Cerebras Wafer Scale Engine series is still a fantastic bit of engineering. A big part of the announcement is that the 5nm WSE-3 is out. One of the cool things is that Cerebras gets a huge bump from process advancements.

We know that the AMD MI300X will easily pass $1B in revenue this year. Cerebras looks on track to pass $1B of revenue, assuming it is selling the entire cluster, not just the multi-million dollar CS-3 boxes. NVIDIA will sell $1B of hardware next week at GTC when it talks more about the NVIDIA H200 and the next-generation NVIDIA B100. Intel’s Gaudi3 we will get an update on, but we have heard a few folks share nine-figure sales forecasts for 2024. Cerebras is perhaps the only company focusing on training that is playing with the larger chipmakers in terms of revenue.

Hopefully we get WSE-4 in the second half of 2025 for another big jump!

4 COMMENTS

  1. A lot about the hardware, nothing about the software.

    Let’s see the tooling and real life benchmarks.

  2. @Flippo

    There is a Cerebras Developer Community Meeting videos on YouTube
    where real performance and development environment of those devices is presented.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.