Value approximation with least squares support vector machine in reinforcement learning system

WANG Xue-song,TIAN Xi-lan,CHENG Yu-hu,MA Xiao-ping
2007-01-01
Journal of Computational and Theoretical Nanoscience
Abstract:In this paper, we propose a new Q learning system based on the least squares support vector machine (LS-SVM) for continuous state space. A least squares support vector machine is used to realize a mapping from state-action pair to Q value function. A sliding time window mechanism and a criterion of adding the new observed data to the training sample set of LS-SVM are introduced into the method. Based on the on-line estimated Q values of all the state-action pairs from LS-SVM, a stochastic action selector is used to generate stochastically an actual action according to Boltzmann-Gibbs distribution of Q values. In order to improve the learning speed, a simulated annealing method is used to adjust the temperature value of Boltzmann-Gibbs distribution dynamically. The simulation results of Mountain Car Control show that the proposed Q learning method has characteristics of a simple control structure and high learning efficiency. Copyright © 2007 American Scientific Publishers. All rights reserved.
What problem does this paper attempt to address?