Abstract:The robotics community has consistently aimed to achieve generalizable robot manipulation with flexible natural language instructions. One of the primary challenges is that obtaining robot data fully annotated with both actions and texts is time-consuming and labor-intensive. However, partially annotated data, such as human activity videos without action labels and robot play data without language labels, is much easier to collect. Can we leverage these data to enhance the generalization capability of robots? In this paper, we propose GR-MG, a novel method which supports conditioning on both a language instruction and a goal image. During training, GR-MG samples goal images from trajectories and conditions on both the text and the goal image or solely on the image when text is unavailable. During inference, where only the text is provided, GR-MG generates the goal image via a diffusion-based image-editing model and condition on both the text and the generated image. This approach enables GR-MG to leverage large amounts of partially annotated data while still using language to flexibly specify tasks. To generate accurate goal images, we propose a novel progress-guided goal image generation model which injects task progress information into the generation process, significantly improving the fidelity and the performance. In simulation experiments, GR-MG improves the average number of tasks completed in a row of 5 from 3.35 to 4.04. In real-robot experiments, GR-MG is able to perform 47 different tasks and improves the success rate from 62.5% to 75.0% and 42.4% to 57.6% in simple and generalization settings, respectively. Code and checkpoints will be available at the project page: <a class="link-external link-https" href="https://gr-mg.github.io/" rel="external noopener nofollow">this https URL</a>.

Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning.

Mastering Robot Control Through Point-based Reinforcement Learning with Pre-training.

Learning Efficient Representations for Goal-conditioned Reinforcement Learning Via Tabu Search

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Become a Proficient Player with Limited Data through Watching Pure Videos

Pre-trained Word Embeddings for Goal-conditional Transfer Learning in Reinforcement Learning

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

Guiding Pretraining in Reinforcement Learning with Large Language Models

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

Combining Hindsight with Goal-enhanced Prediction for Multi-goal Reinforcement Learning

Focus On What Matters: Separated Models For Visual-Based RL Generalization

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot

A Central Motor System Inspired Pre-training Reinforcement Learning for Robotic Control