Zhongyu Li,Xue Bin Peng,Pieter Abbeel,Sergey Levine,Glen Berseth,Koushil Sreenath
Abstract:This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world. The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a method that can control biped robots with high - dimensional, non - linear dynamics and achieve diverse, agile and robust bipedal motion skills. Specifically, the authors aim to:
1. **Establish a general bipedal motion control framework**: Develop a general reinforcement learning (RL) framework applicable to various dynamic bipedal motion skills (such as walking, running and jumping). This framework not only covers periodic skills (such as walking and running), but also non - periodic skills (such as jumping) and stationary skills (such as standing). Through this method, the trained controller can be directly deployed to the actual robot without additional adjustment or training.
2. **Design a novel RL control strategy**: Propose a new dual - history architecture, which combines the long - term and short - term input / output (I/O) history information of the robot for use in non - recursive RL strategies. When combined with the proposed training strategy, this architecture can exhibit state - of - the - art performance in learning dynamic bipedal motion control, and has been verified in simulations and actual experiments.
3. **Study the adaptability of RL controllers**: Through detailed empirical research, explore the adaptability of the control strategies developed through RL. The research shows that this adaptability includes not only adaptation to invariant changes in dynamics, but also adaptation to time - varying changes such as contact events. These results are verified in simulations and further confirmed by several actual zero - sample transfer experiments, such as in - place walking and target jumping.
4. **Improve the robustness of RL controllers**: Introduce a new dimension of robustness, namely task randomization, which significantly enhances robustness by training strategies to handle a wide range of tasks. This method is different from the commonly used dynamics randomization, providing the robot with the ability to adapt to interference, and has been verified in simulations and actual experiments.
5. **Extensive real - world verification and demonstration**: Use Cassie (a human - sized biped robot) to demonstrate its ability to reproduce various motion skills in the real world. Cassie can track under different commands and maintain extremely small tracking errors, while being significantly robust to unexpected interference. In addition, some new capabilities of biped robots are also demonstrated, such as robust standing recovery using different skills, consistent and robust walking over a long period, completing a 400 - meter sprint, and performing diverse bipedal jumps (including standing long jump and high jump).
Overall, this paper addresses the challenges of developing efficient, robust and multi - functional bipedal motion controllers by leveraging deep reinforcement learning, providing valuable insights and guidance for future bipedal robot research.