Abstract:Reinforcement learning is a technique that discovers optimal behavior strategies in a trial-and-error way, and it has become a general method for solving environmental interaction problems. However, as a machine learning method, reinforcement learning faces a common problem in machine learning, or in other words, it is unexplainable. The unexplainable problem limits the application of reinforcement learning in safety-sensitive fields, e.g., medical treatment and transportation, and it leads to a lack of universally applicable solutions in environmental simulation and task generalization. In order to address the problem, extensive research on explainable reinforcement learning（XRL） has emerged. However, academic members still have an inconsistent understanding of XRL. Therefore, this study explores the basic problems of XRL and reviews existing works. To begin with, the study discusses the parent problem, i.e.,explainable artificial intelligence, and summarizes its existing definitions. Next, it constructs a theoretical system of interpretability to describe the common problems of XRL and explainable artificial intelligence. To be specific, it distinguishes between intelligent algorithms and mechanical algorithms, defines interpretability, discusses factors that affect interpretability, and classifies the intuitiveness of interpretability. Then, based on the characteristics of reinforcement learning, the study defines three unique problems of XRL, i.e.,environmental interpretation, task interpretation, and strategy interpretation. After that, the latest research on XRL is reviewed, and the existing methods were systematically classified. Finally, the future research directions of XRL are put forward.

Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

Counterfactual Explanation Policies in RL

Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

Experiential Explanations for Reinforcement Learning

Counterfactual Explainer Framework for Deep Reinforcement Learning Models Using Policy Distillation

Counterfactual explanations and how to find them: literature review and benchmarking

Counterfactual Explanations for Machine Learning: Challenges Revisited

Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes

Learning impartial policies for sequential counterfactual explanations using Deep Reinforcement Learning

ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

Causal Counterfactuals for Improving the Robustness of Reinforcement Learning

Explainable Deep Reinforcement Learning: State of the Art and Challenges

Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

Explainable AI and Reinforcement Learning-A Systematic Review of Current Approaches and Trends

Explainable Reinforcement Learning: Basic Problems Exploration and Method Survey