RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Konstantinos Bousmalis,Giulia Vezzani,Dushyant Rao,Coline Devin,Alex X. Lee,Maria Bauza,Todor Davchev,Yuxiang Zhou,Agrim Gupta,Akhil Raju,Antoine Laurens,Claudio Fantacci,Valentin Dalibard,Martina Zambelli,Murilo Martins,Rugile Pevceviciute,Michiel Blokzijl,Misha Denil,Nathan Batchelor,Thomas Lampe,Emilio Parisotto,Konrad Żołna,Scott Reed,Sergio Gómez Colmenarejo,Jon Scholz,Abbas Abdolmaleki,Oliver Groth,Jean-Baptiste Regli,Oleg Sushkov,Tom Rothörl,José Enrique Chen,Yusuf Aytar,Dave Barker,Joy Ortiz,Martin Riedmiller,Jost Tobias Springenberg,Raia Hadsell,Francesco Nori,Nicolas Heess
2023-12-22
Abstract:The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.
Robotics,Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "RoboCat: A Self - Improving Generalist Agent for Robotic Manipulation" attempts to solve the following problems: 1. **Multi - task and multi - robot adaptation**: - Most of the robot learning research in the real world focuses on one task at a time because the cost of designing tasks and generating robot experiences is very high. However, using large - scale heterogeneous robot data to quickly master new skills and adapt to new robots remains a challenge in the field of robotics. - This paper proposes a multi - task, multi - robot general - purpose agent (RoboCat), aiming to quickly master new skills and adapt to new robots by leveraging the heterogeneous experiences of different robots and tasks. 2. **Decision - making under visual goal conditions**: - RoboCat is a decision transformer based on visual goal conditions and is able to process visual experiences with action labels. These data cover various motion control skills obtained from simulated and real robot arms, with different sets of observations and actions. - Through visual goal conditions, RoboCat can adapt to new tasks and robots in the zero - sample or few - shot (100 - 1000 examples) cases. 3. **Self - improvement ability**: - The trained model itself can be used to generate data required for subsequent training, thus providing a basic autonomous improvement loop. - Through this self - improvement process, RoboCat can not only transfer across tasks, but also adapt to new tasks more efficiently and show better performance on existing tasks. 4. **Large - scale evaluation**: - The authors conducted a large - scale evaluation of RoboCat's capabilities, including experiments in a simulated environment and on three different real - robot instances. - The results show that as the training data increases and diversifies, RoboCat not only shows the ability to transfer across tasks, but also can adapt to new tasks more efficiently. ### Main contributions 1. **For the first time, show that large - scale Transformer sequence models can solve a large number of dexterous tasks on multiple real - robot instances**. 2. **By using a small amount of expert demonstration data, study RoboCat's ability to adapt to unseen tasks, reducing the threshold for learning new skills**. 3. **Demonstrate a simple and effective self - improvement process for reintegrating these skills into a general - purpose agent**. 4. **By expanding and enriching the training data, RoboCat performs better on training tasks and is more efficient when fine - tuning new tasks**. ### Method overview - **Training phase**: - Use the VQ - GAN encoder to pre - process images, and then use large - scale diverse task and robot data to train RoboCat. - Tasks are specified by visual goal conditions, and each task is defined by the set of its valid start and end states. - **Fine - tuning and self - improvement**: - Collect 100 - 1000 expert demonstration data for each task to fine - tune RoboCat to adapt to new tasks. - Deploy the fine - tuned policy to autonomously collect more data, which is used to train a new version of RoboCat. - **Actual deployment**: - Deploy the fine - tuned policy on real robots to collect large - scale data for new tasks. - Solve the problems of success detection and task reset in autonomous data collection, and use the reward model and policy pool to achieve automatic reset. ### Experimental setup - **Robot instances**: - Include simulated and real - world Sawyer and Panda robotic arms, as well as the KUKA 14 - DoF robotic arm. - Each robotic arm is equipped with different grippers, and the KUKA robotic arm uses a custom - made three - finger gripper. - **Tasks and object sets**: - Include various tasks such as structure building, insertion, and lifting, and use multiple real - object sets such as RGB objects, NIST - i gears, YCB fruits and vegetables, etc. - **Data sources**: - Include expert data (data generated by RL - trained agents)