Abstract:This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the performance of biped robots in agile and robust gait planning and control. Specifically, the paper proposes an online bipedal step - planning strategy that combines model predictive control (MPC) and reinforcement learning (RL), aiming to overcome the performance limitations of traditional MPC methods due to the use of simplified models. By introducing learned strategies to bridge the gap between the simplified model and the complex full - order robot system, this method can more effectively capture the whole - body dynamics of the robot, thereby achieving more flexible and adaptable step adjustments.
### Background and Problem Description of the Paper
Agile and robust walking of biped robots is crucial for achieving human - like performance. However, designing a biped robot that can continuously adjust the preset step positions to maintain balance and perform more agile and rapid actions in adverse environments is a major challenge. Although traditional MPC - based step - planning controllers perform well in achieving dynamic walking, their performance is often limited by simplified models and assumptions.
### Solution
The solution proposed in the paper is an enhanced MPC framework that combines MPC and RL techniques. Specifically, this method adopts a hierarchical control architecture, including a high - level (HL) planner that integrates MPC and RL strategies, and a low - level (LL) tracking controller. MPC uses a simplified model to generate an initial sub - optimal step plan, while the RL strategy uses the full - order dynamics model of the robot to fine - tune this plan, thereby overcoming the modeling errors of the simplified model and finally generating a better step strategy.
### Main Contributions
1. **Proposed the first bipedal step - generation framework combining RL and MPC**: This framework significantly improves the robot's walking - speed tracking ability, robustness to external disturbances, walking adaptability (such as the ability to switch between different speed commands), and the ability to traverse arbitrary slopes.
2. **Designed flexible reward terms**: These reward terms are helpful for effective learning from the ALIP - MPC process.
3. **Verified the effectiveness of the method**: Experimental results show that compared with using MPC alone, this method can achieve more agile, robust, and adaptable walking behaviors, especially in overcoming the modeling errors brought by the simplified dynamics model.
### Conclusion
The paper verifies the effectiveness of the proposed framework through a series of experiments, showing that in various walking scenarios, the method of combining MPC and RL can significantly improve the walking performance of biped robots, especially the robustness and adaptability in the face of external disturbances and complex terrains.