Abstract:With the widespread application of Deep Learning (DL), the black-box characteristics of DL raise questions, especially in high-stake decision-making fields like autonomous driving. Consequently, there is a growing demand for research on the interpretability of DL, leading to the emergence of eXplainable Artificial Intelligence as a current research hotspot. Current research on DL interpretability primarily focuses on transparency and post-hoc interpretability. Enhancing interpretability in transparency often requires targeted modifications to the model structure, potentially compromising the model's accuracy. Conversely, improving the interpretability of DL models based on post-hoc interpretability usually does not necessitate adjustments to the model itself. To provide a fast and accurate counterfactual explanation of DL without compromising its performance, this paper proposes a post-hoc interpretation method called relevance inference based on direct contribution to employ counterfactual reasoning in DL. In this method, direct contribution is first designed by improving Layer-wise Relevance Propagation to measure the relevance between the outputs and the inputs. Subsequently, we produce counterfactual examples based on direct contribution. Ultimately, counterfactual results for the DL model are obtained with these counterfactual examples. These counterfactual results effectively describe the behavioral boundaries of the model, facilitating a better understanding of its behavior. Additionally, direct contribution offers an easily implementable interpretable analysis method for studying model behavior. Experiments conducted on various datasets demonstrate that relevance inference can be more efficiently and accurately generate counterfactual examples compared to the state-of-the-art methods, aiding in the analysis of behavioral boundaries in intelligent decision-making models for vehicles.

Interpreting Black-box Machine Learning Models for High Dimensional Datasets

An Interpretable Probabilistic Approach for Demystifying Black-box Predictive Models

Opening the Black Box of Neural Networks: Methods for Interpreting Neural Network Models in Clinical Applications

Interpreting the Black Box of Supervised Learning Models: Visualizing the Impacts of Features on Prediction

A Grey-Box Ensemble Model Exploiting Black-Box Accuracy and White-Box Intrinsic Interpretability

Interpretability of deep learning models: A survey of results

A Survey of the Interpretability Aspect of Deep Learning Models

Understanding the black-box: towards interpretable and reliable deep learning models

Relevance Inference Based on Direct Contribution: Counterfactual Explanation to Deep Networks for Intelligent Decision-making

An Interpretable Neural Network Model Through Piecewise Linear Approximation

Interpreting Deep Learning Model Using Rule-based Method

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Interpreting Deep Classifier by Visual Distillation of Dark Knowledge.

Unwrapping The Black Box of Deep ReLU Networks: Interpretability, Diagnostics, and Simplification

Interpretable Model-Agnostic Explanations Based on Feature Relationships for High-Performance Computing

Explanations of Black-Box Models based on Directional Feature Interactions

Illuminating the Black Box: Interpreting Deep Neural Network Models for Psychiatric Research

Learning outside the Black-Box: The pursuit of interpretable models

VidModEx: Interpretable and Efficient Black Box Model Extraction for High-Dimensional Spaces

Domain Level Interpretability: Interpreting Black-box Model with Domain-specific Embedding.

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond