Learning biped locomotion based on Q-learning and neural networks

Peng Ziqiang,Pan Gang,Yu Ling
DOI: https://doi.org/10.1007/978-3-642-25553-3_39
2011-01-01
Abstract:Robot postures are transformed continuously until an impact occurs. In order to solve the continuous state problem, a Q-Learning controller based on Back-Propagation (BP) Neural Networks is designed. Instead of Q table, a Multi-input Multi-output BP Neural Network is employed to compute Q value for continuous state. Eligibility trace is used to solve time reliability problem in Q-Learning, and we integrate the eligibility trace algorithm to the gradient descent method for continuous state. To avoid dimension explosion, an inverted pendulum pose-energy model is built to reduce the dimension of the input state space. For the sake of balance between "explore" and "exploit" of Q-Learning, we use a new ε-greedy method with a variable stochastic probability, which decreases with the increasing of the step number. Simulation results indicate that the proposed method is effective. © 2011 Springer-Verlag.
What problem does this paper attempt to address?