Abstract:In order to solve the problem that the approximator of the current approximation policy iteration reinforoement learning cannot be constructed completely automatically,a reinforcement learning algorithm of Nonparametric Approximation Policy Iteration based on Dyna Framework (NPAPI-Dyna) was proposed.Sampling cache and sampling change rate were introduced to design a two stage random sampling process to collect samples.By profile tolerance and K-means clustering,core state basis function was generated through trial-and-error process.Q-value function approximator was generated by using the complete coverage of sample as the target.Greedy strategy was applied to design action selector.Access frequency of the state basis function was used to describe environmental topology features and construct environment estimation model.Learning and planning processes were combined organically by identification of Dyna framework to accelerate the speed of learning.In the simulation experiments of single inverted pendulum balance control,when the reinforcement learning error rate is 0.01,the learning success rate of algorithm reaches 100％,the minimum number of successful attempts is only 2,the average number of attempts is only 7.73,and the mean absolute deviation of angle is 3.053 8°,and the average oscillation range of angle is 2.759°.When reinforcement learning error rate is 0.1,100 independent simulation operations are performed,to learn the control strategy,Online-LSPI and BLSPI (Batch Least-Squares Policy Iteration) have to try more than 150 times on average,however NPAPI-Dyna can succeed in 50 times of attempts.The experimental results show that NPAPI-Dyna can be completely automatically constructed and adjusted to enhance the learning structure,with high learning accuracy and rapid convergence ability.

Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

Nonparametric approximation policy iteration reinforcement learning based on Dyna framework

Nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

Approximate Policy Iteration With Deep Minimax Average Bellman Error Minimization

A Learning Algorithm of CMAC Based on RLS

Natural Gradient Based Reinforcement Learning Algorithm Using Active Stimulating

Model Reference Output Feedback Control Using Episodic Natural Actor-Critic

Compound Heuristic Information Guided Policy Improvement for Robot Motor Skill Acquisition

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

Neural-network-based parameter tuning for multi-agent simulation using deep reinforcement learning

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Approximate Policy Iteration Schemes: A Comparison

Improving Sample Efficiency of Multiagent Reinforcement Learning with Nonexpert Policy for Flocking Control.

Actor-Critic Reinforcement Learning with Phased Actor