Heuristic Gait Learning of Quadruped Robot Based on Deep Deterministic Policy Gradient Algorithm

Mingchao Wang,Xiaogang Ruan,Xiaoqing Zhu
DOI: https://doi.org/10.1109/cac51589.2020.9326973
2020-01-01
Abstract:The gait control of the quadruped robot has always been a hot topic in the field of robot research. At present, the traditional control methods have many limitations such as low intelligence and poor autonomy. With the development of artificial intelligence technology, the application of reinforcement learning to the quadruped robot autonomous learning strategy provides a promising solution. Deep deterministic policy gradient (DDPG) algorithm has achieved good performance in continuous control tasks, but such value-based reinforcement learning algorithms have the problem of too high epoch estimates when performing function approximation, then reached a bad strategy actually. In order to solve the above-mentioned problem, this paper proposed a heuristic gait learning method for quadruped robot based on DDPG, inspired by the Double Q-learning algorithm, two independent critics were used to select the smaller value to update the parameters. The Open AI Gym platform was used for experimental verification, which proved that the proposed improved DDPG algorithm had better performance.
What problem does this paper attempt to address?