Explainable Reinforcement Learning via a Causal World Model

Zhongwei Yu,Jingqing Ruan,Dengpeng Xing
2024-01-18
Abstract:Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of generating explanations in reinforcement learning (RL). Specifically, since actions in reinforcement learning may have long - term effects on the future, it is very difficult to generate explanations. This paper proposes a new framework to achieve interpretable reinforcement learning by learning a causal world model without prior knowledge of the causal structure of the environment. This model can capture the effects of actions, allowing us to explain the long - term effects of actions on environmental variables through causal chains, ultimately leading to changes in rewards. ### Main Contributions 1. **Learning Causal Model**: Learn a causal model that can capture environmental dynamics without prior knowledge of the causal structure. 2. **Extracting Causal Influence**: Design a novel method to effectively extract the causal influence of actions, thereby deriving causal chains for explaining agent decisions. 3. **Model Accuracy**: Demonstrate that the proposed explanation model is accurate enough to be used to guide policy learning in model - based reinforcement learning (MBRL). ### Background - **Application of Causality in Reinforcement Learning**: In recent years, reinforcement learning researchers have begun to focus on integrating causality into reinforcement learning to improve the robustness and learning efficiency of the system. - **Existing Explanation Methods**: Most existing interpretable reinforcement learning (XRL) methods use classic interpretable artificial intelligence (XAI) tools, such as saliency maps, but these tools are weak in explaining time - dependence. - **Advantages of Causal Models**: Psychological research shows that people explain the world through causality. The method proposed in this paper uses causal discovery to construct a sparse, interpretable world model instead of using a dense fully - connected model. ### Methods 1. **Causal Discovery**: Assume that output variables are independently generated given input variables, and perform efficient causal discovery through conditional independence testing (CIT). 2. **Inference Network with Attention Mechanism**: Use an inference network with an attention mechanism to fit the structural equations in the causal graph and capture the effects of actions. 3. **Causal Chain Analysis**: Generate explanations by constructing causal chains, revealing the causal influence of agent actions on environmental variables. ### Experimental Results - **Lunarlander - Continuous**: In the continuous action space, explain through causal chains how the agent reduces the distance to the target location and balances the rocket by adjusting speed and angle. - **Build - Marine**: In the discrete action space, explain through causal chains how the agent obtains permission to build a barracks by building a supply station and provides sufficient resources to build more Marine units. ### Conclusion The method proposed in this paper can not only generate accurate explanations, but also maintain high performance in model - based reinforcement learning, thus achieving a good balance between interpretability and learning performance.