Abstract:This work focuses on the sample collection in reinforcement learning (RL), where the interaction with the environment is typically time-consuming and extravagantly expensive. In order to collect samples in a more valuable way, we propose a confidence-based sampling strategy based on the optimal computing budget allocation algorithm (OCBA), which actively allocates the computing efforts to actions with different predictive uncertainties. We estimate the uncertainty with ensembles and generalize them from tabular representations to function approximations. The OCBA-based sampling strategy could be easily integrated into various off-policy RL algorithms, where we take Q-learning, DQN, and SAC as examples to show the incorporation. Besides, we provide the theoretical analysis towards convergence and evaluate the algorithms experimentally. According to the experiments, the incorporated algorithms obtain remarkable gains compared with modern ensemble-based RL algorithms. Note to Practitioners-Reinforcement learning is a powerful tool for handling sequential decision-making problems, e.g., autonomous driving and robotics control, where the behaviors typically have a long-term effect on future events. However, although RL achieves human-level control in some tasks, it severely suffers from low sample efficiency. Therefore, implementing RL in some practical areas, e.g., healthcare and rescue, is extremely hard due to the requirement of massive samples. This work aims to enhance the exploration of RL by incorporating OCBA, which provides an asymptotically optimal data-collection strategy for simulation-based optimization. Based on ensemble-based uncertainty estimation and OCBA-based action selection, the incorporated RL algorithms show competitive performance on many benchmarks and significantly reduce the sampling efforts during iterations.

On Efficient Sampling for Reinforcement Learning with Multiple Constraints*

On Sampling Efficiency Optimization in Constrained Reinforcement Learning*

On Efficient Sampling in Offline Reinforcement Learning

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

On Efficient Sampling in Supervisory Reinforcement Learning

Convergence Rate of Primal-Dual Approach to Constrained Reinforcement Learning with Softmax Policy

An OCBA-Based Method for Efficient Sample Collection in Reinforcement Learning

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse

Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

Efficient Sampling Policy for Selecting a Good Enough Subset

Nearly Minimax Optimal Reward-free Reinforcement Learning.

Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning

Constrained Reinforcement Learning Under Model Mismatch

Efficient Sampling Policy for Selecting a Subset with the Best

Stochastic Control Framework for Determining Feasible Alternatives in Sampling Allocation

Resilient Constrained Reinforcement Learning

Constraints Penalized Q-learning for Safe Offline Reinforcement Learning.

Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes

State-wise Constrained Policy Optimization

An adaptive sampling algorithm for simulation-based optimization with descriptive complexity constraints