Abstract:Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The code is available at <a class="link-external link-https" href="https://github.com/xrt11/XAI-CODE" rel="external noopener nofollow">this https URL</a>.

Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

This actually looks like that: Proto-BagNets for local and global interpretability-by-design

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

Improving Interpretability of Deep Neural Networks in Medical Diagnosis by Investigating the Individual Units

Visual Interpretable and Explainable Deep Learning Models for Brain Tumor MRI and COVID-19 Chest X-ray Images

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Evaluating the Explainability of Attributes and Prototypes for a Medical Classification Model

From local explanations to global understanding with explainable AI for trees

Towards Multi-dimensional Explanation Alignment for Medical Classification

IMPA-Net: Interpretable Multi-Part Attention Network for Trustworthy Brain Tumor Classification from MRI

Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings

Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Exemplars and Counterexemplars Explanations for Image Classifiers, Targeting Skin Lesion Labeling

Towards explainable classifiers using the counterfactual approach -- global explanations for discovering bias in data

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning

Solving the enigma: Deriving optimal explanations of deep networks

Explainable Deep Image Classifiers for Skin Lesion Diagnosis

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification