Abstract:Recently reinforcement learning (RL) has emerged as a promising approach for quadrupedal locomotion, which can save the manual effort in conventional approaches such as designing skill-specific controllers. However, due to the complex nonlinear dynamics in quadrupedal robots and reward sparsity, it is still difficult for RL to learn effective gaits from scratch, especially in challenging tasks such as walking over the balance beam. To alleviate such difficulty, we propose a novel RL-based approach that contains an evolutionary foot trajectory generator. Unlike prior methods that use a fixed trajectory generator, the generator continually optimizes the shape of the output trajectory for the given task, providing diversified motion priors to guide the policy learning. The policy is trained with reinforcement learning to output residual control signals that fit different gaits. We then optimize the trajectory generator and policy network alternatively to stabilize the training and share the exploratory data to improve sample efficiency. As a result, our approach can solve a range of challenging tasks in simulation by learning from scratch, including walking on a balance beam and crawling through the cave. To further verify the effectiveness of our approach, we deploy the controller learned in the simulation on a 12-DoF quadrupedal robot, and it can successfully traverse challenging scenarios with efficient gaits. We provide a video to show the learned gaits in different tasks in YouTube.11[Online]. Available: youtube.com/watch?vhgBLR09MEOw, and code is available in Github: github.com/PaddlePaddle/PaddleRobotics [Online]. Available: youtube.com/watch?vhgBLR09MEOw, and code is available in Github: github.com/PaddlePaddle/PaddleRobotics

Gait Learning Reproduction for Quadruped Robots Based on Experience Evolution Proximal Policy Optimization

Learning biped locomotion based on Q-learning and neural networks

Learning Gait-conditioned Bipedal Locomotion with Motor Adaptation

Gait Learning of Quadruped Robot Based on Deep Arbitration Strategy

Behavior evolution-inspired approach to walking gait reinforcement training for quadruped robots

A parallel heterogeneous policy deep reinforcement learning algorithm for bipedal walking motion design

Reinforcement Learning With Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion

Heuristic Gait Learning of Quadruped Robot Based on Deep Deterministic Policy Gradient Algorithm

Learning Gait of Quadruped Robot Without Prior Knowledge of the Environment

Efficient Learning of Control Policies for Robust Quadruped Bounding using Pretrained Neural Networks

Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning Towards Natural and Robust Gaits

Deep Reinforcement Learning Based Co-Optimization of Morphology and Gait for Small-Scale Legged Robot

Quadruped Robot Get Bionic Learning Method Based on Intelligent Memory Soft Actor-Critic

Economical Quadrupedal Multi-Gait Locomotion via Gait-Heuristic Reinforcement Learning

Learning Smooth and Omnidirectional Locomotion for Quadruped Robots

Estimating Probability Distribution with Q-learning for Biped Gait Generation and Optimization.

ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Estimating Biped Gait Using Spline-Based Probability Distribution Function with Q-Learning.

ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization

Efficient Learning of Robust Quadruped Bounding Using Pretrained Neural Networks