Abstract:In order to solve the problem that the approximator of the current approximation policy iteration reinforoement learning cannot be constructed completely automatically,a reinforcement learning algorithm of Nonparametric Approximation Policy Iteration based on Dyna Framework (NPAPI-Dyna) was proposed.Sampling cache and sampling change rate were introduced to design a two stage random sampling process to collect samples.By profile tolerance and K-means clustering,core state basis function was generated through trial-and-error process.Q-value function approximator was generated by using the complete coverage of sample as the target.Greedy strategy was applied to design action selector.Access frequency of the state basis function was used to describe environmental topology features and construct environment estimation model.Learning and planning processes were combined organically by identification of Dyna framework to accelerate the speed of learning.In the simulation experiments of single inverted pendulum balance control,when the reinforcement learning error rate is 0.01,the learning success rate of algorithm reaches 100％,the minimum number of successful attempts is only 2,the average number of attempts is only 7.73,and the mean absolute deviation of angle is 3.053 8°,and the average oscillation range of angle is 2.759°.When reinforcement learning error rate is 0.1,100 independent simulation operations are performed,to learn the control strategy,Online-LSPI and BLSPI (Batch Least-Squares Policy Iteration) have to try more than 150 times on average,however NPAPI-Dyna can succeed in 50 times of attempts.The experimental results show that NPAPI-Dyna can be completely automatically constructed and adjusted to enhance the learning structure,with high learning accuracy and rapid convergence ability.

Nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering

Nonparametric approximation policy iteration reinforcement learning based on Dyna framework

Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Stochastic Cubic-Regularized Policy Gradient Method

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

Natural Gradient Based Reinforcement Learning Algorithm Using Active Stimulating

Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning

Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

A Novel Policy Iteration Algorithm for Nonlinear Continuous-Time H$\infty$ Control Problem

Approximate Policy Iteration With Deep Minimax Average Bellman Error Minimization

Approximate Policy Iteration Schemes: A Comparison

Enhanced Probabilistic Inference Algorithm Using Probabilistic Neural Networks For Learning Control

Approximation Benefits of Policy Gradient Methods with Aggregated States

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

Linear Function Approximation as a Computationally Efficient Method to Solve Classical Reinforcement Learning Challenges

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

PCFBPI: A Point Clustering Feature Based Policy Iteration Algorithm

Online policy iteration algorithm for semi-Markov switching state-space control processes

Compound Heuristic Information Guided Policy Improvement for Robot Motor Skill Acquisition