Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals

Susu Sun,Stefano Woerner,Andreas Maier,Lisa M. Koch,Christian F. Baumgartner

2023-08-08

Abstract:Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.

Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing

What problem does this paper attempt to address?

The paper attempts to address the issue of the lack of interpretability in deep learning models (especially black-box neural networks) in high-risk application areas such as medical image analysis. Although these models perform excellently in terms of performance, they fail to provide explanations for their predictions, which may lead to user distrust or inappropriate collaboration between machine learning and humans. Existing post-hoc explanation techniques, while widely used, have serious conceptual issues and perform poorly in multi-label classification scenarios, where multiple medical findings may be present in a single image. Therefore, the paper proposes a new method—Attri-Net, aimed at providing an inherently interpretable multi-label classification model that can generate transparent, trustworthy, and easy-to-understand explanations. Specifically, Attri-Net identifies specific medical finding regions in images by generating class-specific counterfactual attribution maps, and then uses a simple logistic regression classifier to make predictions based on these attribution maps. This approach not only retains classification performance but also generates high-quality multi-label explanations that align with clinical knowledge. The paper validates the effectiveness of Attri-Net through experiments on three chest X-ray datasets, showing that its explanation quality and classification performance both surpass existing methods.

Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals

Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

Towards Multi-dimensional Explanation Alignment for Medical Classification

Explaining Black-box Models for Biomedical Text Classification

Improving Interpretability of Deep Neural Networks in Medical Diagnosis by Investigating the Individual Units

Explainable Deep Image Classifiers for Skin Lesion Diagnosis

Domain aware medical image classifier interpretation by counterfactual impact analysis

Exemplars and Counterexemplars Explanations for Image Classifiers, Targeting Skin Lesion Labeling

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Explaining the Black-box Smoothly- A Counterfactual Approach

Explaining black-box text classifiers for disease-treatment information extraction

Transparent and Clinically Interpretable AI for Lung Cancer Detection in Chest X-Rays

FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography

Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Multiple Different Black Box Explanations for Image Classifiers

Evaluating the Explainability of Attributes and Prototypes for a Medical Classification Model

IMPA-Net: Interpretable Multi-Part Attention Network for Trustworthy Brain Tumor Classification from MRI

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

Interpreting and Correcting Medical Image Classification with PIP-Net

Interpretable3D: an Ad-Hoc Interpretable Classifier for 3D Point Clouds