Abstract:Efficient and stable exploration remains a key challenge for deep reinforcement learning (DRL) operating in high-dimensional action and state spaces. Recently, a more promising approach by combining the exploration in the action space with the exploration in the parameters space has been proposed to get the best of both methods. In this article, we propose a new iterative and close-loop framework by combining the evolutionary algorithm (EA), which does explorations in a gradient-free manner directly in the parameters space with an actor-critic, and the deep deterministic policy gradient (DDPG) reinforcement learning algorithm, which does explorations in a gradient-based manner in the action space to make these two methods cooperate in a more balanced and efficient way. In our framework, the policies represented by the EA population (the parametric perturbation part) can evolve in a guided manner by utilizing the gradient information provided by the DDPG and the policy gradient part (DDPG) is used only as a fine-tuning tool for the best individual in the EA population to improve the sample efficiency. In particular, we propose a criterion to determine the training steps required for the DDPG to ensure that useful gradient information can be generated from the EA generated samples and the DDPG and EA part can work together in a more balanced way during each generation. Furthermore, within the DDPG part, our algorithm can flexibly switch between fine-tuning the same previous RL-Actor and fine-tuning a new one generated by the EA according to different situations to further improve the efficiency. Experiments on a range of challenging continuous control benchmarks demonstrate that our algorithm outperforms related works and offers a satisfactory trade-off between stability and sample efficiency.

Taking Complementary Advantages: Improving Exploration Via Double Self-Imitation Learning in Procedurally-Generated Environments

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

GAILPG: Multi-Agent Policy Gradient with Generative Adversarial Imitation Learning

Multi-Agent Exploration Via Self-Learning and Social Learning

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

Improving exploration efficiency of deep reinforcement learning through samples produced by generative model

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Never Give Up: Learning Directed Exploration Strategies

Generative Adversarial Exploration for Reinforcement Learning

Learning Robotic Skills Via Self-Imitation and Guide Reward

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Autonomous Scene Exploration Using Experience Enhancement

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain

A Pragmatic Look at Deep Imitation Learning

Task-Oriented Self-Imitation Learning for Robotic Autonomous Skill Acquisition

Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks

Two Heads Are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning.