Abstract: Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithm (EA) are two major paradigms of policy optimization with distinct learning principles, i.e., gradient-based v.s. gradient free. An appealing research direction is integrating Deep RL and EA to devise new methods by fusing their complementary advantages. However, existing works on combining Deep RL and EA have two common drawbacks: 1) the RL agent and EA agents learn their policies individually, neglecting efficient sharing of useful common knowledge; 2) parameter-level policy optimization guarantees no semantic level of behavior evolution for the EA side. In this paper, we propose Evolutionary Reinforcement Learning with Two-scale State Representation and Policy Representation (ERL-Re2), a novel solution to the aforementioned two drawbacks. The key idea of ERL-Re2 is two-scale representation: all EA and RL policies share the same nonlinear state representation while maintaining individual linear policy representations. The state representation conveys expressive common features of the environment learned by all the agents collectively; the linear policy representation provides a favorable space for efficient policy optimization, where novel behavior-level crossover and mutation operations can be performed. Moreover, the linear policy representation allows convenient generalization of policy fitness with the help of Policy-extended Value Function Approximator (PeVFA), further improving the sample efficiency of fitness estimation. The experiments on a range of continuous control tasks show that ERL-Re2 consistently outperforms strong baselines and achieves significant improvement over both its Deep RL and EA components.

Episodic Reinforcement Learning with Expanded State-reward Space

Sample Efficient Reinforcement Learning Method Via High Efficient Episodic Memory.

Neural Episodic Control with State Abstraction

Episodic Reinforcement Learning with Associative Memory.

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Regioned Episodic Reinforcement Learning

Episodic Memory Deep Q-Networks

Bi-phase Episodic Memory-Guided Deep Reinforcement Learning for Robot Skills

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards.

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

Asynchronous Episodic Deep Deterministic Policy Gradient: Toward Continuous Control in Computationally Complex Environments

Deep Reinforcement Learning with Parametric Episodic Memory

State Representation Learning for Effective Deep Reinforcement Learning.

Offline Reinforcement Learning with Value-based Episodic Memory

Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

ACDER: Augmented Curiosity-Driven Experience Replay

Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration