Abstract:Recently reinforcement learning (RL) has emerged as a promising approach for quadrupedal locomotion, which can save the manual effort in conventional approaches such as designing skill-specific controllers. However, due to the complex nonlinear dynamics in quadrupedal robots and reward sparsity, it is still difficult for RL to learn effective gaits from scratch, especially in challenging tasks such as walking over the balance beam. To alleviate such difficulty, we propose a novel RL-based approach that contains an evolutionary foot trajectory generator. Unlike prior methods that use a fixed trajectory generator, the generator continually optimizes the shape of the output trajectory for the given task, providing diversified motion priors to guide the policy learning. The policy is trained with reinforcement learning to output residual control signals that fit different gaits. We then optimize the trajectory generator and policy network alternatively to stabilize the training and share the exploratory data to improve sample efficiency. As a result, our approach can solve a range of challenging tasks in simulation by learning from scratch, including walking on a balance beam and crawling through the cave. To further verify the effectiveness of our approach, we deploy the controller learned in the simulation on a 12-DoF quadrupedal robot, and it can successfully traverse challenging scenarios with efficient gaits. We provide a video to show the learned gaits in different tasks in YouTube.11[Online]. Available: youtube.com/watch?vhgBLR09MEOw, and code is available in Github: github.com/PaddlePaddle/PaddleRobotics [Online]. Available: youtube.com/watch?vhgBLR09MEOw, and code is available in Github: github.com/PaddlePaddle/PaddleRobotics

Estimating Probability Distribution with Q-learning for Biped Gait Generation and Optimization.

Estimating Biped Gait Using Spline-Based Probability Distribution Function with Q-Learning.

Biped Gait Optimization Using Estimation of Distribution Algorithm.

Biped Gait Optimization Using Spline Function Based Probability Model.

Learning biped locomotion based on Q-learning and neural networks

Dynamic Walking Gait Designing for Biped Robot Based on Particle Swarm Optimization

Optimal On-Line Walking Pattern Generation for Biped Robots

A Real-Time Randomized Navigation Method for Biped Robot

Gait Optimization and Energy-Based Stability for Biped Locomotion Using Large-Scale Programming

Deep Reinforcement Learning Based Co-Optimization of Morphology and Gait for Small-Scale Legged Robot

Distal Learning Applied to Biped Robots

Quadruped Robot Gait Planning Based on Genetic Algorithm

Economical Quadrupedal Multi-Gait Locomotion via Gait-Heuristic Reinforcement Learning

Gait Optimization for Legged Systems Through Mixed Distribution Cross-Entropy Optimization

Optimal Gait Design for a Soft Quadruped Robot via Multi-fidelity Bayesian Optimization

3D walking planning optimization for a biped robot using genetic algorithm

Learning Locomotion for Quadruped Robots via Distributional Ensemble Actor-Critic

Reinforcement Learning With Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion

Real-time Walking Pattern Generation for a Biped Robot with Hybrid CPG-ZMP Algorithm

of Advanced Robotic Systems Real-time Walking Pattern Generation for a Biped Robot with a Hybrid CPG-ZMP Algorithm immediate

Adaptive Gait Acquisition through Learning Dynamic Stimulus Instinct of Bipedal Robot