Abstract:Reinforcement learning (RL) has shown great promise in simulated environments, such as games, where failures have minimal consequences. However, the deployment of RL agents in real-world systems such as autonomous vehicles, robotics, UAVs, and medical devices demands a higher level of safety and transparency, particularly when facing adversarial threats. Safe RL algorithms have been developed to address these concerns by optimizing both task performance and safety constraints. However, errors are inevitable, and when they occur, it is essential that the RL agents can also explain their actions to human operators. This makes trust in the safety mechanisms of RL systems crucial for effective deployment. Explainability plays a key role in building this trust by providing clear, actionable insights into the agent's decision-making process, ensuring that safety-critical decisions are well understood. While machine learning (ML) has seen significant advances in interpretability and visualization, explainability methods for RL remain limited. Current tools fail to address the dynamic, sequential nature of RL and its needs to balance task performance with safety constraints over time. The re-purposing of traditional ML methods, such as saliency maps, is inadequate for safety-critical RL applications where mistakes can result in severe consequences. To bridge this gap, we propose xSRL, a framework that integrates both local and global explanations to provide a comprehensive understanding of RL agents' behavior. xSRL also enables developers to identify policy vulnerabilities through adversarial attacks, offering tools to debug and patch agents without retraining. Our experiments and user studies demonstrate xSRL's effectiveness in increasing safety in RL systems, making them more reliable and trustworthy for real-world deployment. Code is available at <a class="link-external link-https" href="https://github.com/risal-shefin/xSRL" rel="external noopener nofollow">this https URL</a>.

Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

xSRL: Safety-Aware Explainable Reinforcement Learning -- Safety as a Product of Explainability

A User Study on Explainable Online Reinforcement Learning for Adaptive Systems

Explainable Reinforcement Learning via Model Transforms

Explainable Artificial Intelligence (XAI) for Increasing User Trust in Deep Reinforcement Learning Driven Autonomous Systems

Explainable Reinforcement Learning: Basic Problems Exploration and Method Survey

Explainable Reinforcement Learning: A Survey and Comparative Review

Explainable AI and Reinforcement Learning-A Systematic Review of Current Approaches and Trends

Explainable and Safe Reinforcement Learning for Autonomous Air Mobility

A Survey of Explainable Reinforcement Learning

Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems

Abstracted Trajectory Visualization for Explainability in Reinforcement Learning

Experiential Explanations for Reinforcement Learning

Explainable Deep Reinforcement Learning: State of the Art and Challenges

Unraveling Explainable Reinforcement Learning Using Behavior Tree Structures

Why? Why not? When? Visual Explanations of Agent Behavior in Reinforcement Learning

Counterfactual Explainer Framework for Deep Reinforcement Learning Models Using Policy Distillation

Towards Explainable Reinforcement Learning Using Scoring Mechanism Augmented Agents

Explaining Conditions for Reinforcement Learning Behaviors from Real and Imagined Data