Abstract:We propose MIMOC: Motion Imitation from Model-Based Optimal Control. MIMOC is a Reinforcement Learning (RL) controller that learns agile locomotion by imitating reference trajectories from model-based optimal control. MIMOC mitigates challenges faced by other motion imitation RL approaches because the references are dynamically consistent, require no motion retargeting, and include torque references. Hence, MIMOC does not require fine-tuning. MIMOC is also less sensitive to modeling and state estimation inaccuracies than model-based controllers. We validate MIMOC on the Mini-Cheetah in outdoor environments over a wide variety of challenging terrain, and on the MIT Humanoid in simulation. We show cases where MIMOC outperforms model-based optimal controllers, and show that imitating torque references improves the policy's performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve more efficient and robust motion control in legged robots. Specifically, the paper proposes a new Reinforcement Learning (RL) controller - MIMOC (Motion Imitation from Model - Based Optimal Control), aiming to learn agile motion control strategies by imitating the reference trajectories provided by model - based optimal control. MIMOC addresses several key challenges in existing motion - imitation RL methods: 1. **Dynamic Consistency**: The reference trajectories used by MIMOC are dynamically consistent, which means these trajectories are physically feasible and do not require motion relocalization. 2. **No Fine - Tuning Required**: Since the reference trajectories have already taken into account the dynamic characteristics of the robot, MIMOC does not need to be fine - tuned for each specific task. 3. **Reduced Sensitivity to Model and State Estimation Inaccuracies**: Compared with model - based controllers, MIMOC is more robust to model and state estimation inaccuracies because its training process includes noisy observations and randomized physical parameters. The paper verifies the effectiveness of MIMOC through outdoor experiments on the Mini - Cheetah robot and simulations on the MIT humanoid robot. The experimental results show that MIMOC exhibits superior performance on a variety of complex terrains and, in some cases, outperforms model - based optimal controllers. In addition, the paper also emphasizes the importance of torque - tracking rewards in improving policy performance. ### Main Contributions of the Paper 1. **Dynamically Consistent Reference Trajectories**: MIMOC uses reference trajectories generated from model - based optimal controllers. These trajectories include not only position and velocity but also torque data, which makes the learning process more efficient and accurate. 2. **Robustness**: MIMOC improves robustness to state estimation inaccuracies by introducing noise and randomization during the training process. 3. **Generalization Ability**: MIMOC can not only perform well in a simulated environment but also be successfully deployed on actual robots (such as Mini - Cheetah) and operate stably on a variety of complex terrains. 4. **Simplified Deployment**: MIMOC does not require complex motion relocalization or domain adaptation, simplifying the migration process from simulation to actual hardware. ### Solutions to Specific Problems - **Motion Relocalization Problem**: The reference trajectories used by MIMOC are specifically designed for the robot, so no motion relocalization is required. - **Model and State Estimation Inaccuracies**: By introducing noise and randomization during the training process, MIMOC improves its robustness to model and state estimation inaccuracies. - **Training Efficiency**: The introduction of torque - tracking rewards significantly improves training efficiency and reduces training time. In conclusion, MIMOC provides an efficient and robust motion control method for legged robots by combining the advantages of model - based optimal control and reinforcement learning.

Reinforcement Learning for Legged Robots: Motion Imitation from Model-Based Optimal Control

Motion Control of Bionic Robots Via Biomimetic Learning

RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion

A GIM-based approach for biomimetic robot motion learning.

FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior

Deep Reinforcement Learning Based Co-Optimization of Morphology and Gait for Small-Scale Legged Robot

Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

LORM: a Novel Reinforcement Learning Framework for Biped Gait Control

Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning

Reinforcement Learning for Reduced-order Models of Legged Robots

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

Inertia-Constrained Reinforcement Learning to Enhance Human Motor Control Modeling

DecAP: Decaying Action Priors for Accelerated Imitation Learning of Torque-Based Legged Locomotion Policies

Rapid locomotion via reinforcement learning

RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control

Learning Control Policies for Imitating Human Gaits

High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning

Spatio-Temporal Motion Retargeting for Quadruped Robots

A Hierarchical Reinforcement Learning Approach to Control Legged Mobile Manipulators