Abstract:This work focuses on the sample collection in reinforcement learning (RL), where the interaction with the environment is typically time-consuming and extravagantly expensive. In order to collect samples in a more valuable way, we propose a confidence-based sampling strategy based on the optimal computing budget allocation algorithm (OCBA), which actively allocates the computing efforts to actions with different predictive uncertainties. We estimate the uncertainty with ensembles and generalize them from tabular representations to function approximations. The OCBA-based sampling strategy could be easily integrated into various off-policy RL algorithms, where we take Q-learning, DQN, and SAC as examples to show the incorporation. Besides, we provide the theoretical analysis towards convergence and evaluate the algorithms experimentally. According to the experiments, the incorporated algorithms obtain remarkable gains compared with modern ensemble-based RL algorithms. Note to Practitioners-Reinforcement learning is a powerful tool for handling sequential decision-making problems, e.g., autonomous driving and robotics control, where the behaviors typically have a long-term effect on future events. However, although RL achieves human-level control in some tasks, it severely suffers from low sample efficiency. Therefore, implementing RL in some practical areas, e.g., healthcare and rescue, is extremely hard due to the requirement of massive samples. This work aims to enhance the exploration of RL by incorporating OCBA, which provides an asymptotically optimal data-collection strategy for simulation-based optimization. Based on ensemble-based uncertainty estimation and OCBA-based action selection, the incorporated RL algorithms show competitive performance on many benchmarks and significantly reduce the sampling efforts during iterations.

Sample-Efficient Deep Reinforcement Learning Via Balance Sample

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

State Distribution-Aware Sampling for Deep Q-Learning

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Deep Q-learning Sampling Based on Advantages

Deep Q-Learning with Prioritized Sampling.

Sampling Efficient Deep Reinforcement Learning through Preference-Guided Stochastic Exploration

Sample Efficient Deep Reinforcement Learning via Local Planning

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion

Re-attentive experience replay in off-policy reinforcement learning

Posterior Sampling for Deep Reinforcement Learning

Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents

An OCBA-Based Method for Efficient Sample Collection in Reinforcement Learning

Sample-Efficient Reinforcement Learning Via Conservative Model-Based Actor-Critic.

Episodic Memory Deep Q-Networks

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games