Abstract:This work focuses on the sample collection in reinforcement learning (RL), where the interaction with the environment is typically time-consuming and extravagantly expensive. In order to collect samples in a more valuable way, we propose a confidence-based sampling strategy based on the optimal computing budget allocation algorithm (OCBA), which actively allocates the computing efforts to actions with different predictive uncertainties. We estimate the uncertainty with ensembles and generalize them from tabular representations to function approximations. The OCBA-based sampling strategy could be easily integrated into various off-policy RL algorithms, where we take Q-learning, DQN, and SAC as examples to show the incorporation. Besides, we provide the theoretical analysis towards convergence and evaluate the algorithms experimentally. According to the experiments, the incorporated algorithms obtain remarkable gains compared with modern ensemble-based RL algorithms. Note to Practitioners-Reinforcement learning is a powerful tool for handling sequential decision-making problems, e.g., autonomous driving and robotics control, where the behaviors typically have a long-term effect on future events. However, although RL achieves human-level control in some tasks, it severely suffers from low sample efficiency. Therefore, implementing RL in some practical areas, e.g., healthcare and rescue, is extremely hard due to the requirement of massive samples. This work aims to enhance the exploration of RL by incorporating OCBA, which provides an asymptotically optimal data-collection strategy for simulation-based optimization. Based on ensemble-based uncertainty estimation and OCBA-based action selection, the incorporated RL algorithms show competitive performance on many benchmarks and significantly reduce the sampling efforts during iterations.

Novelty-based Sample Reuse for Continuous Robotics Control

Efficient Novelty Search Through Deep Reinforcement Learning

Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning

Dynamics-aware Novelty Search with Behavior Repulsion

State-Novelty Guided Action Persistence in Deep Reinforcement Learning

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning.

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

An OCBA-Based Method for Efficient Sample Collection in Reinforcement Learning

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse

Achieving Sample-Efficient Learning of Long-Horizon Sparse-Reward Robotic Tasks with Base Controllers

Sample Efficient Deep Reinforcement Learning via Local Planning

State Distribution-Aware Sampling for Deep Q-Learning

Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach with Centralized Novelty Measurement

Neural Episodic Control with State Abstraction

Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization

NEARL: Non-Explicit Action Reinforcement Learning for Robotic Control

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning

Signal Novelty Detection as an Intrinsic Reward for Robotics

Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation

A novelty-search-based evolutionary reinforcement learning algorithm for continuous optimization problems