Learning Locomotion for Quadruped Robots via Distributional Ensemble Actor-Critic

Sicen Li,Yiming Pang,Panju Bai,Jiawei Li,Zhaojin Liu,Shihao Hu,Liquan Wang,Gang Wang
DOI: https://doi.org/10.1109/lra.2024.3349934
IF: 5.2
2024-02-01
IEEE Robotics and Automation Letters
Abstract:Domain randomization introduces perturbations in the simulation to make controllers less susceptible to the reality gap, which enables remarkable sim-to-real transfer on real quadruped robots. However, aleatoric uncertainty originating from perturbations could often lead to suboptimal controllers. In this work, we present a novel algorithm called Distributional Ensemble Actor-Critic (DEAC) that blends three ideas: distributional representation of a critic, lower bounds of the value distribution, and ensembling of multiple critics and actors. Distributional representation and ensembling provide reasonable uncertainty estimates, while lower bounds of the value distribution offer finer-grained error control. The simulation results show that the controller trained by DEAC outperforms the other baselines in the domain randomization setting. The trained controller is deployed on an A1-like robot, demonstrating high-speed running and the ability to traverse diverse terrains such as slippery plates, grassland, and wet dirt.
robotics
What problem does this paper attempt to address?