Here is one of the chippier announcements we have seen in some time. Broadcom has a new switch chip called the Broadcom Jerico3-AI. That “-AI” may make one thing that it is doing compute functions in the chip, like NVIDIA Infiniband, but that is not the case. Broadcom further says that with its new Jericho3-AI line, NVIDIA Infiniband is bad for AI clusters.
Broadcom Jericho3-AI Ethernet Switch Launched
For those who are unfamiliar, Broadcom has three main high-end switch families. The Tomahawk line is the company’s high-bandwidth switch platform. Trident is the platform we often see with more features. Then at the lower bandwidth but with deeper buffers and more programmability is the Jericho line. The Broadcom Jericho3-AI BCM88890 is the newest member of that third line at 28.8T. This chip has 144x SerDes lanes, operating at 106Gbps PAM4. It supports up to 18x 800GbE, 36x 400GbE, or 72x 200GbE network-facing ports.
Broadcom’s presentation on the new switch chip sets up the simple message. Large companies and even NVIDIA think that AI workloads can be constrained by network latency and bandwidth.
The Jericho3-AI fabric is designed to lower the time spent in networking during AI training.
Key features of the Jericho3-AI fabric are load balancing to keep links uncongested, fabric scheduling, zero impact failover, and having a high Ethernet Radix. What is notable, is that while we see NVIDIA NDR Infiniband 400Gbps Switches with features like SHARP in-network compute, we asked Broadcom if they had a similar feature and they did not respond that hey do.
Still, Broadcom says its Jericho3-AI Ethernet is better than NVIDIA’s Infiniband by roughly 10% on NCCL performance. Note, the chart that Broadcom shows is not using a 0 scale.
Further, Broadcom says that because it can handle 800Gbps port speed (for PCIe Gen6 servers) and more, it is a better choice. For putting “AI” in the name, it is interesting that Broadcom does not have network AI compute functions listing since that is a major NVIDIA selling point with its Infiniband architecture.
Broadcom is also showing its co-packaged optics, along with DACs, which we assume do not work together. It, however, says that its solution is more energy efficient.
It was a strange announcement since it was very light on speeds and feeds. The Jericho line is not Broadcom’s high-bandwidth line trailing Tomahawk and Trident, so that is likely why.
We should learn more about Jericho3-AI at the OCP Regional Summit 2023 this week. We also expect it will take some time until we see products with the new chips. Usually switch chips get announced, then development of switches happens, then production silicon hits OEMs, followed by actual switch availability. In the meantime, we have a Tomahawk 4 platform in the lab that we will be showing with NVIDIA ConnectX-7 NICs when the video is finished being edited.
Patrick J Kennedy on Twitter: “Just for some scale. A 4th Gen @Intel Xeon Scalable “Sapphire Rapids” CPU next to a @Broadcom Tomahawk 4 CPU. You should see the difference in the cooler sizes. https://t.co/PlWnABSYkG” / Twitter