Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

Ting JI,Hua ZHANG
DOI: https://doi.org/10.3778/j.issn.1002-8331.1709-0489
2019-01-01
Abstract:In order to solve the problems of high computational complexity and slow convergence rate of online approxi-mation policy iteration reinforcement learning, this essay proposes a nonparametric approximation policy iteration rein-forcement learning based on CMAC(NPAPI-CMAC)by introducing CMAC structure as the value function approximator. The CMAC’s generic parameter is determined by constructing the sampling process and its state partition mode is con-firmed by using initial partition and development partition. The reinforcement learning rate is defined by building sample numbers set of tilling. Through all these ways the reinforcement learning structure and parameters are constructed com-pletely automatically. In addition, the algorithm uses delta rule and the nearest neighbor method to automatically adjust the parameters of the algorithm in the learning process, and uses the greedy strategy to select an action which is obtained from voting machine. The simulation results on the balancing control of a single inverted pendulum show the effective-ness, robustness and rapid convergence ability of the proposed algorithm.
What problem does this paper attempt to address?