Nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering

Ting JI,Hua ZHANG
DOI: https://doi.org/10.13195/j.kzyjc.2016.1148
2017-01-01
Abstract:A nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering(NPAGPI-SC) is proposed to solve the problems such as large calculating quantity and building basis function incompletely automated for the current approximation policy iteration reinforcement learning algorithm. In this algorithm, two stage random sampling process is used to collect samples, the trial-and-error process and the estimation algorithm for covering samples completely are utilized to compute approximator's initial parameters, the delta rule and nearest neighbor method are exploited to adjust the approximator automatically in the learning process, and the greedy strategy is adopted to select an action. The results of simulation on the balancing control of a single inverted pendulum show the effectiveness and robustness of the proposed algorithm.
What problem does this paper attempt to address?