Enhancing Model-Based Step Adaptation for Push Recovery through Reinforcement Learning of Step Timing and Region

Tobias Egle,Yashuai Yan,Dongheui Lee,Christian Ott
2024-11-02
Abstract:This paper introduces a new approach to enhance the robustness of humanoid walking under strong perturbations, such as substantial pushes. Effective recovery from external disturbances requires bipedal robots to dynamically adjust their stepping strategies, including footstep positions and timing. Unlike most advanced walking controllers that restrict footstep locations to a predefined convex region, substantially limiting recoverable disturbances, our method leverages reinforcement learning to dynamically adjust the permissible footstep region, expanding it to a larger, effectively non-convex area and allowing cross-over stepping, which is crucial for counteracting large lateral pushes. Additionally, our method adapts footstep timing in real time to further extend the range of recoverable disturbances. Based on these adjustments, feasible footstep positions and DCM trajectory are planned by solving a QP. Finally, we employ a DCM controller and an inverse dynamics whole-body control framework to ensure the robot effectively follows the trajectory.
Robotics,Systems and Control
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to enhance the walking robustness of humanoid robots when they are subjected to strong external perturbations (such as large - magnitude thrusts). Specifically, the paper proposes a new method to improve the robot's ability to recover from external disturbances by dynamically adjusting the gait time and area through reinforcement learning. #### Main problem description: 1. **Limitations of existing methods**: - **Restrictive stride area**: Most advanced walking controllers limit the foot position to a predefined convex area, which greatly restricts the recoverable disturbance range. - **Fixed gait time**: Existing model - based methods are usually unable to adjust the stride time in real - time, thus limiting the adaptability to disturbances of different intensities and directions. 2. **Objectives**: - **Expand stride area**: By introducing non - convex areas and cross - step behaviors, the robot can better cope with large lateral thrusts. - **Adjust stride time in real - time**: By adjusting the stride time in real - time, further expand the recoverable disturbance range. - **Optimize trajectory planning**: Combine model - predictive control and inverse - dynamics whole - body control frameworks to ensure that the robot can effectively follow the planned trajectory. #### Solution overview: - **Application of reinforcement learning**: Use reinforcement learning to dynamically adjust the key parameters of stride time and stride area, such as step frequency, single - support percentage, and the rotation angle of the stride area. - **Improved model - predictive control**: Solve quadratic programming (QP) to plan feasible foot positions and DCM trajectories, ensuring that the optimal solution is found within the extended non - convex area. - **Integrated control system**: Combine the DCM controller and the inverse - dynamics whole - body control framework to ensure that the robot can effectively track the planned trajectory and maintain balance. #### Experimental verification: The paper verifies the effectiveness of this method through experiments. The experimental setup includes training the reinforcement - learning model using the PPO algorithm, random - thrust tests in the simulated environment, and a comparative analysis with the baseline method. The results show that the new method significantly improves the robot's recovery ability and robustness under various thrusts. In conclusion, by introducing reinforcement learning and improved model - predictive control, this paper solves the problem of insufficient walking - recovery ability of humanoid robots under strong perturbations, especially the limitations in stride - area and gait - time adjustment.