Abstract:Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously. In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent's learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent's performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively.

Accelerating the Learning of TAMER with Counterfactual Explanations

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Dialogue Learning with Human-in-the-Loop.

Facial feedback for reinforcement learning: a case study and offline analysis using the TAMER framework

RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

Experiential Explanations for Reinforcement Learning

ACDER: Augmented Curiosity-Driven Experience Replay

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

Temporal-Difference Learning to Assist Human Decision Making during the Control of an Artificial Limb

ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning

A Digital Twin Framework for Reinforcement Learning with Real-Time Self-Improvement via Human Assistive Teleoperation

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency

Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities

Tell me why: Training preferences-based RL with human preferences and step-level explanations

Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Integrating human learning and reinforcement learning: A novel approach to agent training

Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework