Nonparametric approximation policy iteration reinforcement learning based on Dyna framework

Ting JI,Hua ZHANG
DOI: https://doi.org/10.11772/j.issn.1001-9081.2017102531
2018-01-01
Abstract:In order to solve the problem that the approximator of the current approximation policy iteration reinforoement learning cannot be constructed completely automatically,a reinforcement learning algorithm of Nonparametric Approximation Policy Iteration based on Dyna Framework (NPAPI-Dyna) was proposed.Sampling cache and sampling change rate were introduced to design a two stage random sampling process to collect samples.By profile tolerance and K-means clustering,core state basis function was generated through trial-and-error process.Q-value function approximator was generated by using the complete coverage of sample as the target.Greedy strategy was applied to design action selector.Access frequency of the state basis function was used to describe environmental topology features and construct environment estimation model.Learning and planning processes were combined organically by identification of Dyna framework to accelerate the speed of learning.In the simulation experiments of single inverted pendulum balance control,when the reinforcement learning error rate is 0.01,the learning success rate of algorithm reaches 100%,the minimum number of successful attempts is only 2,the average number of attempts is only 7.73,and the mean absolute deviation of angle is 3.053 8°,and the average oscillation range of angle is 2.759°.When reinforcement learning error rate is 0.1,100 independent simulation operations are performed,to learn the control strategy,Online-LSPI and BLSPI (Batch Least-Squares Policy Iteration) have to try more than 150 times on average,however NPAPI-Dyna can succeed in 50 times of attempts.The experimental results show that NPAPI-Dyna can be completely automatically constructed and adjusted to enhance the learning structure,with high learning accuracy and rapid convergence ability.
What problem does this paper attempt to address?