Abstract:Policy gradient algorithms for reinforcement learning (RL) have successfully tackled a broad range of high-dimensional continuous RL problems, including many challenging robotic control problems. These algorithms can be largely divided into two categories, i.e., on-policy algorithms and off-policy algorithms. Off-policy deep RL (DRL) algorithms enjoy better sample efficiency than and often outperform on-policy algorithms. However, cutting-edge off-policy algorithms still suffer from the low-quality estimation of policy gradients, resulting in compromised learning performance and high sensitivity to hyper-parameter settings. To address this issue, we propose a new concept of robust policy gradient (RPG). Driven by RPG, this paper further develops a new policy ensemble gradient (PEG) algorithm for DRL, inspired by the recent success of several ensemble DRL algorithms. PEG efficiently and effectively estimates RPG by using multiple policy gradients obtained respectively from several off-policy base learners in an ensemble. The estimated RPG is then utilized for training all base learners simultaneously. Comprehensive experiments have been performed on six Mujoco benchmark problems. Compared to four state-of-the-art off-policy algorithms and four cutting-edge ensemble policy gradient algorithms, our new PEG algorithm achieved highly competitive stability, performance and sample efficiency. Further analysis shows that PEG is insensitive to varied hyper-parameter settings, confirming the positive role of RPG in building reliable and effective off-policy DRL algorithms.

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

Deep Reinforcement Learning That Matters

Investigating Generalisation in Continuous Deep Reinforcement Learning

Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies

Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods

The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations

A survey of benchmarking frameworks for reinforcement learning

RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

Continuous control with deep reinforcement learning

Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control

Position: Benchmarking is Limited in Reinforcement Learning Research

Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers

Policy ensemble gradient for continuous control problems in deep reinforcement learning

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Benchmarking Safe Exploration in Deep Reinforcement Learning

An Optical Control Environment for Benchmarking Reinforcement Learning Algorithms

The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

URLB: Unsupervised Reinforcement Learning Benchmark