TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer

Jun Yamada,Marc Rigter,Jack Collins,Ingmar Posner

2023-11-07

Abstract:Model-based RL is a promising approach for real-world robotics due to its improved sample efficiency and generalization capabilities compared to model-free RL. However, effective model-based RL solutions for vision-based real-world applications require bridging the sim-to-real gap for any world model learnt. Due to its significant computational cost, standard domain randomisation does not provide an effective solution to this problem. This paper proposes TWIST (Teacher-Student World Model Distillation for Sim-to-Real Transfer) to achieve efficient sim-to-real transfer of vision-based model-based RL using distillation. Specifically, TWIST leverages state observations as readily accessible, privileged information commonly garnered from a simulator to significantly accelerate sim-to-real transfer. Specifically, a teacher world model is trained efficiently on state information. At the same time, a matching dataset is collected of domain-randomised image observations. The teacher world model then supervises a student world model that takes the domain-randomised image observations as input. By distilling the learned latent dynamics model from the teacher to the student model, TWIST achieves efficient and effective sim-to-real transfer for vision-based model-based RL tasks. Experiments in simulated and real robotics tasks demonstrate that our approach outperforms naive domain randomisation and model-free methods in terms of sample efficiency and task performance of sim-to-real transfer.

Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the efficient sim-to-real transfer problem in vision-based model-based reinforcement learning (RL) for real-world robotic tasks. Specifically, the paper proposes TWIST (Teacher-Student World Model Distillation for Sim-to-Real Transfer), a method that leverages teacher-student knowledge distillation to accelerate the transfer of vision-based model-based RL from simulation environments to the real world. The paper points out that although model-based RL has advantages in sample efficiency and generalization ability, it still faces challenges in real-world applications. This is mainly because it requires a large amount of data to train the world model and needs to overcome the gap between simulation and the real world. Traditional domain randomization (DR) methods, while improving generalization ability, significantly increase the amount of data required for training, making the training process extremely time-consuming. To address this issue, the TWIST method is implemented through the following steps: 1. Train a "teacher" world model using the state information from the simulation. 2. Simultaneously collect domain-randomized image observation data. 3. Transfer the knowledge learned by the "teacher" model to a "student" world model that is based solely on image input through knowledge distillation. 4. This distillation process includes not only static state distribution matching but also dynamic imagined trajectory alignment, enabling the "student" model to perform better in real environments. Experimental results show that TWIST outperforms simple domain randomization and other model-free RL methods in both simulated and real robotic tasks, with significant improvements in sample efficiency and task performance.

TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer

Sim-To-Real Transfer for Miniature Autonomous Car Racing

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Real-time Policy Distillation in Deep Reinforcement Learning

Learn to Teach: Improve Sample Efficiency in Teacher-student Learning for Sim-to-Real Transfer

TraKDis: A Transformer-based Knowledge Distillation Approach for Visual Reinforcement Learning with Application to Cloth Manipulation

Restructuring the Teacher and Student in Self-Distillation

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey

Understanding Domain Randomization for Sim-to-real Transfer

TransDreamer: Reinforcement Learning with Transformer World Models

Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies

One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering

Multi-Teacher Distillation With Single Model for Neural Machine Translation

Efficient Imitation Learning with Conservative World Models

Fractional Transfer Learning for Deep Model-Based Reinforcement Learning

Bird's Eye View Based Pretrained World model for Visual Navigation

Efficient World Models with Context-Aware Tokenization

Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Selective Cross-Task Distillation

DROID: Minimizing the Reality Gap Using Single-Shot Human Demonstration

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning