Google Details TPUv4 and its Crazy Optically Reconfigurable AI Network

1
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_14 Large
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_14 Large

At Hot Chips 2023, Google showed off its crazy optically reconfigurable AI network. The company is doing optical circuit switching to achieve better performance, lower power, and more flexibility for its AI training cluster. The more amazing part is that they have had this in production for years.

This is being done live, so please excuse typos.

Google Details its Crazy Optically Reconfigurable AI Network

The big goal of this is to tie together the Google TPU chips.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_02
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_02

Here is the 7nm Google TPUv4. We expect this week we will start hearing more about TPUv5. Google usually can do papers and presentations about one-generation old hardware. The TPU v4i was the inference version, but this is more the TPUv4 focused talk.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_03
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_03

Google says it overprovisions power compared to typical power so it can fulfill a 5ms service time SLA. So TDP on the chips is much higher, but that is to allow bursting to meet those SLA bursts.

Here is the TPUv4 architecture diagram. Google builds these TPU chips not just to be a single accelerator, but to scale out and run as part of large-scale infrastructure.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_04
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_04

Here is the Google TPUv4 versus TPUv3 stats in one of the clearest tables we have ever seen on this.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_05
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_05

Google has more than doubled the peak FLOPS, but reduced the power between TPUv3 and TPUv4.

Google has a SparseCore accelerator built into the TPUv4.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_06
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_06

Here is Google’s TPUv4 SparseCore performance.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_07
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_07

The board itself has four TPUv4 chips and is liquid-cooled. Google said that they had to rework data centers and operations to change to liquid cooling, but the power savings are worth it. The valve on the right controls flow through the liquid cooling tubes. Google says it is like a fan speed controller, but for liquid.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_08
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect

Google also says that it is using PCIe Gen3 x16 back to the host since this was a 2020 design.

Google has power entering from the top of rack like many data centers, but then it has a number of interconnects. Within a rack, Google can use electrical DACs, but outside of a rack, Google needs to use optical cables.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_09
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_09

Each system has 64 racks with 4096 interconnected chips. For some sense, NVIDIA’s AI clusters at 256 nodes have half as many GPUs.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_10
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_10

Also at the end of the racks, we see a CDU rack. If you want to learn more about liquid cooling, you can see our How Liquid Cooling Servers Works with Gigabyte and CoolIT. We are going to have more liquid cooling content soon. Google says the flow rates of liquid are higher than water in a hook and ladder firetruck’s hose.

Each rack is a 4x4x4 cube (64 nodes) with optical circuit switching (OCS) between the TPUs. Within the rack, the connections are DACs. The faces of the cube are all optical.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_11
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_11

Here is a look at the OCS. Instead of using an electrical switch, using the OCS gives a direct connection between chips. Google has internal 2D MEMS arrays, lenses, cameras and more. Avoiding all of the networking overhead allows sharing of data more efficiently. As a quick aside, this in some ways feels akin to DLP TVs.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_12
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_12

Google said that it has over 16,000 connections and enough distance of fiber in the super pod that it can encircle the state of Rhode Island.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_13
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_13

Because there is so much point-to-point communication, it requires a lot of fiber strands.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_14 Large
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_14 Large

Beyond that each pool can be connected to larger pools.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_15
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_15

The OCS, because it is reconfigurable, can yield higher utilization of the nodes.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_16
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_16

Google can then change topologies by adjusting the optical routing.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_17
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_17

Here Google is showing the benefit of different topologies.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_18
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_18

This is important since Google says that the changes in model needs can drive system changes.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_19
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_19

Here is Google’s scaling on a log scale with linear speedups on up to 3072 chips.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_20
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_20

Google also increased the on-chip memory to 128MB to keep data access local.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_21
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_21

Here is Google’s comparison against the NVIDIA A100 on a performance-per-watt basis.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_22
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_22

Here is the PaLM model training over 6144 TPUs in two pods.

Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_23
Google Machine Learning Supercomputer With An Optically Reconfigurable Interconnect _Page_23

That is a huge number!

Final Words

It is about time for Google to start talking about the TPUv5, and it happens that Google NEXT is this week. Still, this optical interconnect is a really innovative technology.

Something that is pretty clear now, is that Google is solving large problems with huge infrastructure. It has the opportunity to push into the AI space more. It is just a question of how fast Google will start pushing its AI hardware and cloud services against NVIDIA while also needing to buy NVIDIA GPUs for its customers using those instead of TPUs.

1 COMMENT

  1. Actually a lot of things they presented is already implemented on Fugaku with a larger scale, so it is not a big surprise that this is in production for years.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.