Abstract:This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the performance of biped robots in agile and robust gait planning and control. Specifically, the paper proposes an online bipedal step - planning strategy that combines model predictive control (MPC) and reinforcement learning (RL), aiming to overcome the performance limitations of traditional MPC methods due to the use of simplified models. By introducing learned strategies to bridge the gap between the simplified model and the complex full - order robot system, this method can more effectively capture the whole - body dynamics of the robot, thereby achieving more flexible and adaptable step adjustments. ### Background and Problem Description of the Paper Agile and robust walking of biped robots is crucial for achieving human - like performance. However, designing a biped robot that can continuously adjust the preset step positions to maintain balance and perform more agile and rapid actions in adverse environments is a major challenge. Although traditional MPC - based step - planning controllers perform well in achieving dynamic walking, their performance is often limited by simplified models and assumptions. ### Solution The solution proposed in the paper is an enhanced MPC framework that combines MPC and RL techniques. Specifically, this method adopts a hierarchical control architecture, including a high - level (HL) planner that integrates MPC and RL strategies, and a low - level (LL) tracking controller. MPC uses a simplified model to generate an initial sub - optimal step plan, while the RL strategy uses the full - order dynamics model of the robot to fine - tune this plan, thereby overcoming the modeling errors of the simplified model and finally generating a better step strategy. ### Main Contributions 1. **Proposed the first bipedal step - generation framework combining RL and MPC**: This framework significantly improves the robot's walking - speed tracking ability, robustness to external disturbances, walking adaptability (such as the ability to switch between different speed commands), and the ability to traverse arbitrary slopes. 2. **Designed flexible reward terms**: These reward terms are helpful for effective learning from the ALIP - MPC process. 3. **Verified the effectiveness of the method**: Experimental results show that compared with using MPC alone, this method can achieve more agile, robust, and adaptable walking behaviors, especially in overcoming the modeling errors brought by the simplified dynamics model. ### Conclusion The paper verifies the effectiveness of the proposed framework through a series of experiments, showing that in various walking scenarios, the method of combining MPC and RL can significantly improve the walking performance of biped robots, especially the robustness and adaptability in the face of external disturbances and complex terrains.

RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control

Nonlinear MPC-Based Control Framework for Quadruped Robots: Touch-Down in Complex Terrain

Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion

Learning Agile Locomotion and Adaptive Behaviors via RL-augmented MPC

PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion

Bipedal Walking on Constrained Footholds with MPC Footstep Control

Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control

A Real-Time Planning and Control Framework for Robust and Dynamic Quadrupedal Locomotion

Robust biped locomotion using deep reinforcement learning on top of an analytical control approach

RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion

Multi-Layered Safety for Legged Robots via Control Barrier Functions and Model Predictive Control

Reduced Model Predictive Control Toward Highly Dynamic Quadruped Locomotion

Combining model-predictive control and predictive reinforcement learning for stable quadrupedal robot locomotion

Robust Locomotion Exploiting Multiple Balance Strategies: An Observer-Based Cascaded Model Predictive Control Approach

Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

Robust Dynamic Locomotion via Reinforcement Learning and Novel Whole Body Controller

Learning Bipedal Walking On Planned Footsteps For Humanoid Robots

Time-Varying ALIP Model and Robust Foot-Placement Control for Underactuated Bipedal Robot Walking on a Swaying Rigid Surface

Terrain-Adaptive, ALIP-Based Bipedal Locomotion Controller via Model Predictive Control and Virtual Constraints