Abstract:Deep neural networks are complex and opaque. As they enter application in a variety of important and safety critical domains, users seek methods to explain their output predictions. We develop an approach to explaining deep neural networks by constructing causal models on salient concepts contained in a CNN. We develop methods to extract salient concepts throughout a target network by using autoencoders trained to extract human-understandable representations of network activations. We then build a bayesian causal model using these extracted concepts as variables in order to explain image classification. Finally, we use this causal model to identify and visualize features with significant causal influence on final classification.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the interpretability and comprehensibility of deep neural networks (DNN). Specifically, the author aims to explain the output predictions of DNN by constructing a causal model and make these explanations have human - understandable concepts. The following are the key points of the paper: 1. **Problem Background**: - Deep neural networks (DNN) are widely used in many important and safety - critical fields, such as autonomous driving, medical image diagnosis, etc. - However, DNNs are usually complex and opaque, which makes it difficult to understand or explain their output predictions. - In some high - risk fields (such as the criminal justice system, child welfare intervention, cancer diagnosis, etc.), the lack of interpretability may bring serious social consequences. 2. **Shortcomings of Existing Methods**: - Current methods (such as GradCAM and LRP) have improved the interpretability of DNN, but they lack the ability of causal explanation. - These methods mainly focus on generating saliency maps, indicating which pixels in the image have an impact on the output of CNN, but fail to provide specific causal relationships. 3. **Research Objectives**: - Construct a causal model to explain the operation of DNN and allow users to perform arbitrary causal interventions and queries. - Extract low - dimensional human - understandable concepts as variables in the causal model. - Verify the influence of these concepts on the output of DNN through experiments, thereby improving the interpretability and comprehensibility of the model. 4. **Specific Methods**: - Use autoencoder to extract human - understandable concepts from the activation of DNN. - Construct a Bayesian causal model, taking these concepts as variables to explain image classification tasks. - Through causal intervention experiments, identify and visualize the features that have a significant causal impact on the final classification results. 5. **Contributions**: - Propose a causal model of DNN operation based on human - understandable concepts to assist interpretability and comprehensibility. - Develop an unsupervised technique for extracting high - probability human - understandable concepts from DNN. - Propose a method for measuring the causal effects of input and concepts on the output of DNN. In summary, this paper aims to solve the interpretability and comprehensibility problems of deep neural networks in practical applications by introducing a causal learning and explanation framework, thereby making the application of DNN in key fields more reliable and trustworthy.

Causal Learning and Explanation of Deep Neural Networks via Autoencoded Activations

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning

Explaining Classifiers with Causal Concept Effect (CaCE)

Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

Explaining Deep Learning Models using Causal Inference

Causality in Neural Networks -- An Extended Abstract

Embedding deep networks into visual explanations

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning

Explaining Predictions of Deep Neural Classifier via Activation Analysis

Causal Feature Attribution: Towards a Trustworthy and Actionable Explanations of Deep Neural Network

Causal Deep Learning: Causal Capsules and Tensor Transformers

Deep Learning for Case-Based Reasoning Through Prototypes: A Neural Network That Explains Its Predictions

Amortized learning of neural causal representations

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

Concept Activation Regions: A Generalized Framework For Concept-Based Explanations

Causal Generative Explainers using Counterfactual Inference: A Case Study on the Morpho-MNIST Dataset

Causal Deep Learning

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey

Unsupervised Learning of Neural Networks to Explain Neural Networks