Policy Iteration Reinforcement Learning Based on Geodesic Gaussian Basis Defined on State-action Graph

CHENG Yu-Hu,FENG Huan-Ting,WANG Xue-Song
DOI: https://doi.org/10.3724/sp.j.1004.2011.00044
2011-01-01
ACTA AUTOMATICA SINICA
Abstract:For policy iteration reinforcement learning methods, the construction of basis functions is an important factor of influencing the accuracy of action-value function approximation. In order to construct appropriate basis functions for the action-value function approximation, a policy iteration reinforcement learning method based on geodesic Gaussian basis defined on state-action graph is proposed. At first, a state-action graph for a Markov decision process is constructed according to an off-policy method. Secondly, geodesic Gaussian kernel functions are defined on the state-action graph and a kernel sparsification approach based on approximate linear dependency is used to automatically select centers of the geodesic Gaussian kernels. At last, the geodesic Gaussian kernels based on the state-action graph is used to approximate the action-value function during the process of policy evaluation, and then the policy is improved based on the estimated action-value function. Simulation results concerning a 10 × 10 grid-world illustrate that the proposed method can accurately approximate the action-value function having smoothness and discontinuity properties with less basis functions as compared with the policy iteration reinforcement learning methods based on either ordinary Gaussian basis or geodesic Gaussian basis defined on a state graph, which is helpful for obtaining an optimal policy effectively.
What problem does this paper attempt to address?