Abstract:This work focuses on the sample collection in reinforcement learning (RL), where the interaction with the environment is typically time-consuming and extravagantly expensive. In order to collect samples in a more valuable way, we propose a confidence-based sampling strategy based on the optimal computing budget allocation algorithm (OCBA), which actively allocates the computing efforts to actions with different predictive uncertainties. We estimate the uncertainty with ensembles and generalize them from tabular representations to function approximations. The OCBA-based sampling strategy could be easily integrated into various off-policy RL algorithms, where we take Q-learning, DQN, and SAC as examples to show the incorporation. Besides, we provide the theoretical analysis towards convergence and evaluate the algorithms experimentally. According to the experiments, the incorporated algorithms obtain remarkable gains compared with modern ensemble-based RL algorithms. Note to Practitioners-Reinforcement learning is a powerful tool for handling sequential decision-making problems, e.g., autonomous driving and robotics control, where the behaviors typically have a long-term effect on future events. However, although RL achieves human-level control in some tasks, it severely suffers from low sample efficiency. Therefore, implementing RL in some practical areas, e.g., healthcare and rescue, is extremely hard due to the requirement of massive samples. This work aims to enhance the exploration of RL by incorporating OCBA, which provides an asymptotically optimal data-collection strategy for simulation-based optimization. Based on ensemble-based uncertainty estimation and OCBA-based action selection, the incorporated RL algorithms show competitive performance on many benchmarks and significantly reduce the sampling efforts during iterations.

Accelerating Robot Reinforcement Learning with Samples of Different Simulation Precision.

Robot Simulation and Reinforcement Learning Training Platform Based on Distributed Architecture.

Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion

SAM-RL: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Deep Reinforcement Learning Based Robot Arm Manipulation with Efficient Training Data through Simulation

Simulation-Aided Policy Tuning for Black-Box Robot Learning

Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

End-to-End and Highly-Efficient Differentiable Simulation for Robotics

A Mobile Robot Experiment System with Lightweight Simulator Generator for Deep Reinforcement Learning Algorithm.

Sim-To-Real Transfer for Miniature Autonomous Car Racing

Learn to Teach: Improve Sample Efficiency in Teacher-student Learning for Sim-to-Real Transfer

Sample Efficient Deep Reinforcement Learning via Local Planning

An OCBA-Based Method for Efficient Sample Collection in Reinforcement Learning

Learning Quadrotor Control From Visual Features Using Differentiable Simulation

Asynchronous Methods for Model-Based Reinforcement Learning

Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep Reinforcement Learning with Demonstration-like Sampled Exploration

Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation

Sim-to-Real Transfer with Action Mapping and State Prediction for Robot Motion Control

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse