DeepMind Researchers have developed a groundbreaking model called Robotic Transformer 2 (RT-2) that combines web data with robotics data to enable robots to understand and follow instructions. While high-capacity vision-language models are adept at recognizing patterns in visual and language data, robots require firsthand experience to handle various tasks and situations. RT-2 addresses this challenge by learning from both web-scale datasets and real-world robot interactions.
The model builds upon its predecessor, Robotic Transformer 1 (RT-1), which was trained on demonstrations by 13 robots over a period of 17 months in an office kitchen environment. By leveraging the knowledge gathered from RT-1 and incorporating web-scale data, RT-2 can generate generalized instructions for controlling robots.
This innovation is a significant step towards enhancing the capabilities of robots. Instead of relying solely on limited robot data, RT-2 equips robots with a broader understanding of visual and language patterns, allowing them to operate in diverse environments and handle various tasks. The integration of web and robotics data empowers robots to grasp complex instructions and carry out actions more effectively.
Source : DeepMind
0 Comments