Online support vector regression for reinforcement learning

Zhenhua Yu,Yuanli Cai
2007-01-01
Abstract:The goal in reinforcement learning is to learn the value of state-action pair in order to maximize the total reward. For continuous states and actions in the real world, the representation of value functions is critical. Furthermore, the samples in value functions are sequentially obtained. Therefore, an online support vector regression (OSVR) is set up, which is a function approximator to estimate value functions in reinforcement learning. OSVR updates the regression function by analyzing the possible variation of support vector sets after new samples are inserted to the training set. To evaluate the OSVR learning ability, it is applied to the mountain-car task. The simulation results indicate that the OSVR has a preferable convergence speed and can solve continuous problems that are infeasible using lookup table.
What problem does this paper attempt to address?