Google Details TPUv4 and its Crazy Optically Reconfigurable AI Network

August 29, 2023

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_14 Large

At Hot Chips 2023, Google showed off its crazy optically reconfigurable AI network. The company is doing optical circuit switching to achieve better performance, lower power, and more flexibility for its AI training cluster. The more amazing part is that they have had this in production for years.

This is being done live, so please excuse typos.

Google Details its Crazy Optically Reconfigurable AI Network

The big goal of this is to tie together the Google TPU chips.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_02

Here is the 7nm Google TPUv4. We expect this week we will start hearing more about TPUv5. Google usually can do papers and presentations about one-generation old hardware. The TPU v4i was the inference version, but this is more the TPUv4 focused talk.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_03

Google says it overprovisions power compared to typical power so it can fulfill a 5ms service time SLA. So TDP on the chips is much higher, but that is to allow bursting to meet those SLA bursts.

Here is the TPUv4 architecture diagram. Google builds these TPU chips not just to be a single accelerator, but to scale out and run as part of large-scale infrastructure.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_04

Here is the Google TPUv4 versus TPUv3 stats in one of the clearest tables we have ever seen on this.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_05

Google has more than doubled the peak FLOPS, but reduced the power between TPUv3 and TPUv4.

Google has a SparseCore accelerator built into the TPUv4.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_06

Here is Google’s TPUv4 SparseCore performance.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_07

The board itself has four TPUv4 chips and is liquid-cooled. Google said that they had to rework data centers and operations to change to liquid cooling, but the power savings are worth it. The valve on the right controls flow through the liquid cooling tubes. Google says it is like a fan speed controller, but for liquid.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_08 — Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect

Google also says that it is using PCIe Gen3 x16 back to the host since this was a 2020 design.

Google has power entering from the top of rack like many data centers, but then it has a number of interconnects. Within a rack, Google can use electrical DACs, but outside of a rack, Google needs to use optical cables.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_09

Each system has 64 racks with 4096 interconnected chips. For some sense, NVIDIA’s AI clusters at 256 nodes have half as many GPUs.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_10

Also at the end of the racks, we see a CDU rack. If you want to learn more about liquid cooling, you can see our How Liquid Cooling Servers Works with Gigabyte and CoolIT. We are going to have more liquid cooling content soon. Google says the flow rates of liquid are higher than water in a hook and ladder firetruck’s hose.

Each rack is a 4x4x4 cube (64 nodes) with optical circuit switching (OCS) between the TPUs. Within the rack, the connections are DACs. The faces of the cube are all optical.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_11

Here is a look at the OCS. Instead of using an electrical switch, using the OCS gives a direct connection between chips. Google has internal 2D MEMS arrays, lenses, cameras and more. Avoiding all of the networking overhead allows sharing of data more efficiently. As a quick aside, this in some ways feels akin to DLP TVs.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_12

Google said that it has over 16,000 connections and enough distance of fiber in the super pod that it can encircle the state of Rhode Island.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_13

Because there is so much point-to-point communication, it requires a lot of fiber strands.

Beyond that each pool can be connected to larger pools.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_15

The OCS, because it is reconfigurable, can yield higher utilization of the nodes.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_16

Google can then change topologies by adjusting the optical routing.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_17

Here Google is showing the benefit of different topologies.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_18

This is important since Google says that the changes in model needs can drive system changes.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_19

Here is Google’s scaling on a log scale with linear speedups on up to 3072 chips.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_20

Google also increased the on-chip memory to 128MB to keep data access local.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_21

Here is Google’s comparison against the NVIDIA A100 on a performance-per-watt basis.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_22

Here is the PaLM model training over 6144 TPUs in two pods.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_23

That is a huge number!

Final Words

It is about time for Google to start talking about the TPUv5, and it happens that Google NEXT is this week. Still, this optical interconnect is a really innovative technology.

Something that is pretty clear now, is that Google is solving large problems with huge infrastructure. It has the opportunity to push into the AI space more. It is just a question of how fast Google will start pushing its AI hardware and cloud services against NVIDIA while also needing to buy NVIDIA GPUs for its customers using those instead of TPUs.

1 COMMENT

Gnattu August 29, 2023 At 10:27 am

Actually a lot of things they presented is already implemented on Fugaku with a larger scale, so it is not a big surprise that this is in production for years.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Google Details its Crazy Optically Reconfigurable AI Network

Final Words

RELATED ARTICLESMORE FROM AUTHOR

The Top NVIDIA HGX B200 Server Supermicro SYS-422GA-NBRT-LCC at OCP 2024

Meta Announces AMD Instinct MI300X for AI Inference and NVIDIA GB200 Catalina

AMD Pensando Pollara 400 UltraEthernet RDMA NIC Launched

1 COMMENT

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR