Abstract:Dyna is a planning paradigm that naturally weaves learning and planning together through environmental models. Dyna-style reinforcement learning improves the sample efficiency using the simulation experience generated by the environment model to update the value function. However, the existing Dyna-style planning methods are usually based on tabular methods, only suitable for tasks with low-dimensional and small-scale space. In addition, the quality of the simulation experience generated by the existing methods cannot be guaranteed, which significantly limits its application in tasks such as continuous control of high-dimensional robots and autonomous driving. To this end, we propose a model-based approach that controls planning through a validator. The validator filters high-quality experiences for policy learning and decides whether to stop planning. To deal with the exploration and exploitation dilemma in reinforcement learning, a combination of & epsilon;-greedy strategy and simulated annealing (SA) cooling schedule control is designed as an action selection strategy. The excellent performance of the proposed method is demonstrated in a set of classical Atari games. Experimental results show that learning dynamic models in some games can improve sample efficiency. This benefit is maximized by choosing the proper planning steps. In the optimization planning phase, our method maintains a smaller gap with the current state-of-the-art model-based reinforcement learning (MuZero). In order to achieve a good compromise between model accuracy and optimal programming step size, it is necessary to control the programming reasonably. The practical application of this method in a physical robot system helps reduce the influence of an imprecise depth prediction model on the task. Without human supervision, it is easier to collect training data and learn complex skills (such as grabbing and carrying items) while being more effective at scaling tasks that have never been seen before.

A Phased Dyna Reinforcement Learning Algorithm

An Optimized Dyna Architecture Algorithm with Prioritized Sweeping

Spacecraft Attitude Maneuver Planning Based on Deep Reinforcement Learning under Complex Constraints

Robot Simulation and Reinforcement Learning Training Platform Based on Distributed Architecture.

A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation

Adaptive Disassembly Sequence Planning for VR Maintenance Training Via Deep Reinforcement Learning

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems

Efficient Reinforcement Learning in Continuous State and Action Spaces with Dyna and Policy Approximation.

Bayesian Q learning method with Dyna architecture and prioritized sweeping

Dyna-Validator: A Model-based Reinforcement Learning Method with Validated Simulated Experiences.

An Improved Dyna-Q Algorithm for Mobile Robot Path Planning in Unknown Dynamic Environment

Improved Dyna-Q: A Reinforcement Learning Method Focused via Heuristic Graph for AGV Path Planning in Dynamic Environments

An Improved Dyna-Q Algorithm Inspired by the Forward Prediction Mechanism in the Rat Brain for Mobile Robot Path Planning

A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms

Nonparametric approximation policy iteration reinforcement learning based on Dyna framework

Deep Q-Learning with Phased Experience Cooperation.

An efficient reinforcement learning algorithm for continuous actions

Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning.

A Distributed Path Planning Algorithm via Reinforcement Learning

Actor-Critic Reinforcement Learning with Phased Actor