Google VCU Video Coding Unit at Hot Chips 33

0
Google YouTube VCU
Google YouTube VCU

We previously covered the Google VCU or Video Coding Unit in Google YouTube VCU for Warehouse-scale Video Acceleration. At Hot Chips 33, the company gave more insight into the solution than it did in the original paper. We are doing this one live, so please excuse typos.

Google VCU Video Coding Unit at Hot Chips 33

Google has some particular challenges. Specifically, it is one of the firms directly impacted by the overall types and mix of Internet traffic. It now says that video is more than 60% of overall video traffic and as video gets higher resolution and framerates, this increases bandwidth needs.

HC33 Google VCU Video Is A Majority Of Internet Traffic
HC33 Google VCU Video Is A Majority Of Internet Traffic

There are a number of different compression and encoding codecs. Each codec gets more efficient at compressing video leading to smaller file sizes and smaller streams.

HC33 Google VCU Video Is Getting Harder To Compress
HC33 Google VCU Video Is Getting Harder To Compress

However, the challenge is not to encode the same content. Instead it is to work with content that continues to get higher resolution and higher framerate. Also, the higher levels of compression generally require more compute. Ultimately, saving 30-40% on bandwidth is a good goal, but it requires a lot of compute on a growing problem to make that a reality.

HC33 Google VCU Video Is Getting Harder To Compress Times Increase
HC33 Google VCU Video Is Getting Harder To Compress Times Increase

Google realized it needed to create its own chips to handle higher bitrate source video.

HC33 Google VCU Why Develop Own Video Chips Needs
HC33 Google VCU Why Develop Own Video Chips Needs

As a result, Google wanted a number of capabilities not available from commercial products.

HC33 Google VCU Why Develop Own Video Chips Wants NA
HC33 Google VCU Why Develop Own Video Chips Wants NA

It also wanted to get close to software encoding, but with a lower power and faster ASIC.

HC33 Google VCU Why Develop Own Video Chips Near Parity To SW Quality
HC33 Google VCU Why Develop Own Video Chips Near Parity To SW Quality

The impact of deploying the VCU ASICs was a massive decrease in CPU use.

HC33 Google VCU Cut Down YouTube Compute Cycles
HC33 Google VCU Cut Down YouTube Compute Cycles

Since we are doing these live, we are just going to show the slides for the video encoder cores.

HC33 Google VCU Video Encoder Core 1
HC33 Google VCU Video Encoder Core 1

Here is the pre-processing:

HC33 Google VCU Video Encoder Core 2
HC33 Google VCU Video Encoder Core 2

Here is the temporal denoiser/ filter:

HC33 Google VCU Video Encoder Core 3
HC33 Google VCU Video Encoder Core 3

Here is the motion search and rate-distortion otpimization engine:

HC33 Google VCU Video Encoder Core 4
HC33 Google VCU Video Encoder Core 4

Here is the reconstruction and entropy coding:

HC33 Google VCU Video Encoder Core 5
HC33 Google VCU Video Encoder Core 5

This has the interfaces to read frames and decompress/ compress the frame buffer:

HC33 Google VCU Video Encoder Core 6
HC33 Google VCU Video Encoder Core 6

This has the software programmable registers.

HC33 Google VCU Video Encoder Core 7
HC33 Google VCU Video Encoder Core 7

Google has teams that design hardware in addition to software. It used a high-level synthesis design flow.

HC33 Google VCU Design Flow
HC33 Google VCU Design Flow

This meant that Google could design the VCU using a higher level language (C++) making the development much faster.

HC33 Google VCU Design Flow CPP
HC33 Google VCU Design Flow CPP

It also kept the limited team working on high-value features and problems.

HC33 Google VCU Design By High Value Problems
HC33 Google VCU Design By High Value Problems

Overall, Google seems to be very focused on using VCU ASICs in the future. Google has many applications for video such as YouTube, Google Drive, and more.

HC33 Google VCU Warehouse Scale ASICs
HC33 Google VCU Warehouse Scale ASICs

The VCU design goals included maximizing the utilization. There are ten encoder cores adn three decoder cores along with LPDDR interfaces.

HC33 Google VCU Chip Design Goals
HC33 Google VCU Chip Design Goals

Here is the drill-down into the ASIC:

HC33 Google VCU ASIC
HC33 Google VCU ASIC

Here is the VCU network on chip topology:

HC33 Google VCU NoC Topology
HC33 Google VCU NoC Topology

The VCU has its own firmware that runs the ASIC and allows userspace choices of codecs for example.

HC33 Google VCU VCU Firmware
HC33 Google VCU VCU Firmware

At the system level, these are deployed with 20 VCUs per system.

HC33 Google VCU System And Rack
HC33 Google VCU System And Rack

We covered this in the previous article on the VCU, but here is the architecture from Google’s whitepaper.

Google YouTube VCU System Bandwidth
Google YouTube VCU System Bandwidth

The net impact is that the VCU is more efficient than a dual socket 2017-era Intel Xeon syhstem for h.264 and five servers for VP9.

HC33 Google VCU Performance
HC33 Google VCU Performance

Google also focuses on building clusters of machines, but it is fairly clear that Google can put a large number of VCUs to work.

HC33 Google VCU Cluster And Beyond
HC33 Google VCU Cluster And Beyond

Google also found that over time, it was able to get its hardware encoders to beat software encoding. The “opprotnunistic software decoding” happens when sometimes encoding happens on the CPUs as available. Google also needs to be able to monitor and determine if a VCU is failing in the data center, or if a core is failing as an example.

HC33 Google VCU HW SW Co Design
HC33 Google VCU HW SW Co Design

It seems like Google is reaping a lot of benefit from the VCU.

HC33 Google VCU Conclusion
HC33 Google VCU Conclusion

If Google is showing us its VCU today, there is a non-trivial chance it is either working on, or has a newer version already.

Final Words

Overall, it is great to see Google is showing off more about its VCU. In our previous piece we offered to take a better photo, but were told that some of the not-shown and blurry parts of the VCU image was specifically to protect IP.

Now if we can just get Google to talk more about its hardware than just the TPU and VCU lines.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.