Reinforcement Learning for Legged Robots: Motion Imitation from Model-Based Optimal Control

AJ Miller,Shamel Fahmi,Matthew Chignoli,Sangbae Kim
2023-05-18
Abstract:We propose MIMOC: Motion Imitation from Model-Based Optimal Control. MIMOC is a Reinforcement Learning (RL) controller that learns agile locomotion by imitating reference trajectories from model-based optimal control. MIMOC mitigates challenges faced by other motion imitation RL approaches because the references are dynamically consistent, require no motion retargeting, and include torque references. Hence, MIMOC does not require fine-tuning. MIMOC is also less sensitive to modeling and state estimation inaccuracies than model-based controllers. We validate MIMOC on the Mini-Cheetah in outdoor environments over a wide variety of challenging terrain, and on the MIT Humanoid in simulation. We show cases where MIMOC outperforms model-based optimal controllers, and show that imitating torque references improves the policy's performance.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve more efficient and robust motion control in legged robots. Specifically, the paper proposes a new Reinforcement Learning (RL) controller - MIMOC (Motion Imitation from Model - Based Optimal Control), aiming to learn agile motion control strategies by imitating the reference trajectories provided by model - based optimal control. MIMOC addresses several key challenges in existing motion - imitation RL methods: 1. **Dynamic Consistency**: The reference trajectories used by MIMOC are dynamically consistent, which means these trajectories are physically feasible and do not require motion relocalization. 2. **No Fine - Tuning Required**: Since the reference trajectories have already taken into account the dynamic characteristics of the robot, MIMOC does not need to be fine - tuned for each specific task. 3. **Reduced Sensitivity to Model and State Estimation Inaccuracies**: Compared with model - based controllers, MIMOC is more robust to model and state estimation inaccuracies because its training process includes noisy observations and randomized physical parameters. The paper verifies the effectiveness of MIMOC through outdoor experiments on the Mini - Cheetah robot and simulations on the MIT humanoid robot. The experimental results show that MIMOC exhibits superior performance on a variety of complex terrains and, in some cases, outperforms model - based optimal controllers. In addition, the paper also emphasizes the importance of torque - tracking rewards in improving policy performance. ### Main Contributions of the Paper 1. **Dynamically Consistent Reference Trajectories**: MIMOC uses reference trajectories generated from model - based optimal controllers. These trajectories include not only position and velocity but also torque data, which makes the learning process more efficient and accurate. 2. **Robustness**: MIMOC improves robustness to state estimation inaccuracies by introducing noise and randomization during the training process. 3. **Generalization Ability**: MIMOC can not only perform well in a simulated environment but also be successfully deployed on actual robots (such as Mini - Cheetah) and operate stably on a variety of complex terrains. 4. **Simplified Deployment**: MIMOC does not require complex motion relocalization or domain adaptation, simplifying the migration process from simulation to actual hardware. ### Solutions to Specific Problems - **Motion Relocalization Problem**: The reference trajectories used by MIMOC are specifically designed for the robot, so no motion relocalization is required. - **Model and State Estimation Inaccuracies**: By introducing noise and randomization during the training process, MIMOC improves its robustness to model and state estimation inaccuracies. - **Training Efficiency**: The introduction of torque - tracking rewards significantly improves training efficiency and reduces training time. In conclusion, MIMOC provides an efficient and robust motion control method for legged robots by combining the advantages of model - based optimal control and reinforcement learning.