Abstract:This work focuses on the sample collection in reinforcement learning (RL), where the interaction with the environment is typically time-consuming and extravagantly expensive. In order to collect samples in a more valuable way, we propose a confidence-based sampling strategy based on the optimal computing budget allocation algorithm (OCBA), which actively allocates the computing efforts to actions with different predictive uncertainties. We estimate the uncertainty with ensembles and generalize them from tabular representations to function approximations. The OCBA-based sampling strategy could be easily integrated into various off-policy RL algorithms, where we take Q-learning, DQN, and SAC as examples to show the incorporation. Besides, we provide the theoretical analysis towards convergence and evaluate the algorithms experimentally. According to the experiments, the incorporated algorithms obtain remarkable gains compared with modern ensemble-based RL algorithms. Note to Practitioners-Reinforcement learning is a powerful tool for handling sequential decision-making problems, e.g., autonomous driving and robotics control, where the behaviors typically have a long-term effect on future events. However, although RL achieves human-level control in some tasks, it severely suffers from low sample efficiency. Therefore, implementing RL in some practical areas, e.g., healthcare and rescue, is extremely hard due to the requirement of massive samples. This work aims to enhance the exploration of RL by incorporating OCBA, which provides an asymptotically optimal data-collection strategy for simulation-based optimization. Based on ensemble-based uncertainty estimation and OCBA-based action selection, the incorporated RL algorithms show competitive performance on many benchmarks and significantly reduce the sampling efforts during iterations.

On Efficient Sampling in Supervisory Reinforcement Learning

On Efficient Sampling in Offline Reinforcement Learning

On Sampling Efficiency Optimization in Constrained Reinforcement Learning*

On Efficient Sampling for Reinforcement Learning with Multiple Constraints*

A Rank-Based Sampling Framework for Offline Reinforcement Learning

An OCBA-Based Method for Efficient Sample Collection in Reinforcement Learning

Optimistic Sampling Strategy for Data-Efficient Reinforcement Learning

Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

Nearly Minimax Optimal Reward-free Reinforcement Learning.

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Generalizable Policy Improvement Via Reinforcement Sampling (student Abstract)

Sample-Efficient Deep Reinforcement Learning Via Balance Sample

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse

Sample-Efficient Reinforcement Learning Via Conservative Model-Based Actor-Critic.

Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning

Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks

State Distribution-Aware Sampling for Deep Q-Learning

Deep Q-learning Sampling Based on Advantages

Efficient Sampling Policy for Selecting a Good Enough Subset

Online Reinforcement Learning via Posterior Sampling of Policy

On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond