Abstract:Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The code is available at <a class="link-external link-https" href="https://github.com/xrt11/XAI-CODE" rel="external noopener nofollow">this https URL</a>.

Improving the Quality of Explanations with Local Embedding Perturbations

Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning

LP-Explain: Local Pictorial Explanation for Outliers.

XPROAX-Local explanations for text classification with progressive neighborhood approximation

GLEAMS: Bridging the Gap Between Local and Global Explanations

Model Agnostic Multilevel Explanations

Generative Example-Based Explanations: Bridging the Gap between Generative Modeling and Explainability

Locality Pursuit Embedding

Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations

ExplainLFS: Explaining neural architectures for similarity learning from local perturbations in the latent feature space

LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations

Dual feature-based and example-based explanation methods

Leveraging Local Structure for Improving Model Explanations: An Information Propagation Approach

Local explanation methods for deep neural networks lack sensitivity to parameter values

Evaluating Local Explanations using White-box Models

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Learning local discrete features in explainable-by-design convolutional neural networks

Global-to-Local Support Spectrums for Language Model Explainability

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

BMB-LIME: LIME with modeling local nonlinearity and uncertainty in explainability

Locally Linear Embedding Preserving Local Neighborhood