Introducing RT-2: Robots Learning from the Web

DeepMind Researchers have developed a groundbreaking model called Robotic Transformer 2 (RT-2) that combines web data with robotics data to enable robots to understand and follow instructions. While high-capacity vision-language models are adept at recognizing patterns in visual and language data, robots require firsthand experience to handle various tasks and situations. RT-2 addresses this challenge by learning from both web-scale datasets and real-world robot interactions.

Today, we announced 𝗥𝗧-𝟮: a first of its kind vision-language-action model to control robots. 🤖

It learns from both web and robotics data and translates this knowledge into generalised instructions.

Find out more: https://t.co/UWAzrhTOJG pic.twitter.com/U4VW8IsvAD
— Google DeepMind (@GoogleDeepMind) July 28, 2023

The model builds upon its predecessor, Robotic Transformer 1 (RT-1), which was trained on demonstrations by 13 robots over a period of 17 months in an office kitchen environment. By leveraging the knowledge gathered from RT-1 and incorporating web-scale data, RT-2 can generate generalized instructions for controlling robots.

This innovation is a significant step towards enhancing the capabilities of robots. Instead of relying solely on limited robot data, RT-2 equips robots with a broader understanding of visual and language patterns, allowing them to operate in diverse environments and handle various tasks. The integration of web and robotics data empowers robots to grasp complex instructions and carry out actions more effectively.

Source : DeepMind

Introducing RT-2: Robots Learning from the Web

1 min

0 Comments

Cancel reply

Posted by Belmechri

Like it? Share with your friends!

0 Comments