Gait Learning of Quadruped Robot Based on Deep Arbitration Strategy
ZHU Xiaoqing,CHEN Jiangtao,ZHANG Siyuan,LIU Xinyuan,RUAN Xiaogang
DOI: https://doi.org/10.15918/j.tbit1001-0645.2022.213
2023-01-01
Abstract:Reproducing the learning process of higher organisms is an important research direction in robot research. Some commonly used reinforcement learning algorithms had been explored based on actor critic (AC) networks to accomplish this task. Due to some shortcomings still existed in the reinforcement learning algorithms, some improvements were also took place. For the deep deterministic policy gradient (DDPG), an overestimated problem to Q value led to deterioration of the learning effect. Inspired by the arbitration mechanism in the prefrontal cortex of the brain, a deep arbitration actor critic (DAAC) algorithm was proposed, including two sets of evaluation networks. Through the arbitration mechanism, an optimal evaluation network was selected to update the policy parameters, solving the overestimated problem to Q value effectively. This algorithm enables the quadruped robot reproduce the bionic gait learning process. In simulation experiments, the DAAC algorithm was compared with three algorithms, DDPG, soft actor critic (SAC), and proximal policy optimization (PPO). The experiment results show that the gait of the quadruped robot trained by DAAC has better performance in three aspects, reward value, machine stability, and speed, verifying effectively the superiority of the algorithm.