Google's AI Supercomputers Beats Nvidia in Speed and Energy Efficiency

Google has long been a leader in artificial intelligence, using it to power everything from its search engine to voice-activated assistants. Now, the tech giant is shedding new light on the technology that fuels its AI systems, releasing details about the supercomputers it uses to train its models.

According to a recent announcement from the company, Google’s supercomputers are both faster and more power-efficient than comparable systems from Nvidia Corp. This is thanks in large part to the Tensor Processing Unit (TPU), a custom chip that Google has designed specifically for AI training.

In fact, Google uses TPUs for more than 90% of its AI training work, which involves feeding data through models to teach them how to perform tasks like generating images or responding to queries with human-like text. The TPU is now in its fourth generation, and Google has strung more than 4,000 of the chips together to create a supercomputer using custom-developed optical switches to connect individual machines.

Improving connections between chips has become a key point of competition among companies that build AI supercomputers. Large language models like those used in Google Bard and OpenAI have grown exponentially in size, making it impossible to store them on a single chip. Instead, these models must be split across thousands of chips, which must work together for weeks or more to train the model.

Google’s largest publicly disclosed language model to date, called PaLM, was trained by splitting it across two of the 4,000-chip supercomputers over a period of 50 days. Google said that its supercomputers are designed to make it easy to reconfigure connections between chips on the fly, which helps avoid problems and tweak for performance gains.

This flexibility even allows the company to change the topology of the supercomputer interconnect to accelerate the performance of an ML (machine learning) model.

In a recent scientific paper, Google reported that its supercomputer is up to 1.7 times faster and 1.9 times more power-efficient than a system based on Nvidia’s A100 chip that was on the market at the same time as the fourth-generation TPU. Google did not compare its fourth-generation to Nvidia’s current flagship H100 chip because the H100 came to the market after Google’s chip and is made with newer technology.

Google has hinted that it may be working on a new TPU that would compete with the Nvidia H100, but has provided no details. According to Google Fellow Norm Jouppi and Google Distinguished Engineer David Patterson, the TPU’s circuit switching makes it easy to route around failed components and change the topology of the supercomputer interconnect to accelerate the performance of an ML model. With a healthy pipeline of future chips, Google is poised to remain a leader in the field of AI for years to come.