Causal Learning and Explanation of Deep Neural Networks via Autoencoded Activations

Michael Harradon,Jeff Druce,Brian Ruttenberg
DOI: https://doi.org/10.48550/arXiv.1802.00541
2018-02-02
Abstract:Deep neural networks are complex and opaque. As they enter application in a variety of important and safety critical domains, users seek methods to explain their output predictions. We develop an approach to explaining deep neural networks by constructing causal models on salient concepts contained in a CNN. We develop methods to extract salient concepts throughout a target network by using autoencoders trained to extract human-understandable representations of network activations. We then build a bayesian causal model using these extracted concepts as variables in order to explain image classification. Finally, we use this causal model to identify and visualize features with significant causal influence on final classification.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the interpretability and comprehensibility of deep neural networks (DNN). Specifically, the author aims to explain the output predictions of DNN by constructing a causal model and make these explanations have human - understandable concepts. The following are the key points of the paper: 1. **Problem Background**: - Deep neural networks (DNN) are widely used in many important and safety - critical fields, such as autonomous driving, medical image diagnosis, etc. - However, DNNs are usually complex and opaque, which makes it difficult to understand or explain their output predictions. - In some high - risk fields (such as the criminal justice system, child welfare intervention, cancer diagnosis, etc.), the lack of interpretability may bring serious social consequences. 2. **Shortcomings of Existing Methods**: - Current methods (such as GradCAM and LRP) have improved the interpretability of DNN, but they lack the ability of causal explanation. - These methods mainly focus on generating saliency maps, indicating which pixels in the image have an impact on the output of CNN, but fail to provide specific causal relationships. 3. **Research Objectives**: - Construct a causal model to explain the operation of DNN and allow users to perform arbitrary causal interventions and queries. - Extract low - dimensional human - understandable concepts as variables in the causal model. - Verify the influence of these concepts on the output of DNN through experiments, thereby improving the interpretability and comprehensibility of the model. 4. **Specific Methods**: - Use autoencoder to extract human - understandable concepts from the activation of DNN. - Construct a Bayesian causal model, taking these concepts as variables to explain image classification tasks. - Through causal intervention experiments, identify and visualize the features that have a significant causal impact on the final classification results. 5. **Contributions**: - Propose a causal model of DNN operation based on human - understandable concepts to assist interpretability and comprehensibility. - Develop an unsupervised technique for extracting high - probability human - understandable concepts from DNN. - Propose a method for measuring the causal effects of input and concepts on the output of DNN. In summary, this paper aims to solve the interpretability and comprehensibility problems of deep neural networks in practical applications by introducing a causal learning and explanation framework, thereby making the application of DNN in key fields more reliable and trustworthy.