Abstract:Reinforcement Learning (RL) has achieved great success in sequential decision-making problems, but often at the cost of a large number of agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process. In practice, these demonstrations, which are often collected from human users, are costly and hence often constrained to a limited amount. How to select the best set of human demonstrations that is most beneficial for learning therefore becomes a major concern. This paper presents EARLY (Episodic Active Learning from demonstration querY), an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space. Based on a trajectory-level estimate of uncertainty in the agent's current policy, EARLY determines the optimized timing and content for feature-based queries. By querying episodic demonstrations as opposed to isolated state-action pairs, EARLY improves the human teaching experience and achieves better learning performance. We validate the effectiveness of our method in three simulated navigation tasks of increasing difficulty. The results show that our method is able to achieve expert-level performance for all three tasks with convergence over 30\% faster than other baseline methods when demonstrations are generated by simulated oracle policies. The results of a follow-up pilot user study (N=18) further validate that our method can still maintain a significantly better convergence in the case of human expert demonstrators while achieving a better user experience in perceived task load and consuming significantly less human time.

Accelerating Wargaming Reinforcement Learning by Dynamic Multi-Demonstrator Ensemble.

Expert demonstrations guide reward decomposition for multi-agent cooperation

A Method for High-Value Driving Demonstration Data Generation Based on One-Dimensional Deep Convolutional Generative Adversarial Networks

Active Deep Q-learning with Demonstration

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Deep Q-learning From Demonstrations

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

An Improved Approach Towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep Reinforcement Learning with Demonstration-like Sampled Exploration

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Demonstration actor critic

Overcoming Exploration in Reinforcement Learning with Demonstrations

A Deep Reinforcement Learning-Based Method Applied for Solving Multi-Agent Defense and Attack Problems.

A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space

ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations

Continuous Reinforcement Learning From Human Demonstrations With Integrated Experience Replay For Autonomous Driving

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

"Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations