Abstract:Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference, similarities of latent features to prototypes are linearly classified to form predictions and attribution maps are provided to explain the similarity. In this work, we evaluate whether architectures for case-based reasoning fulfill established axioms required for faithful explanations using the example of ProtoPNet. We show that such architectures allow the extraction of faithful explanations. However, we prove that the attribution maps used to explain the similarities violate the axioms. We propose a new procedure to extract explanations for trained ProtoPNets, named ProtoPFaith. Conceptually, these explanations are Shapley values, calculated on the similarity scores of each prototype. They allow to faithfully answer which prototypes are present in an unseen image and quantify each pixel's contribution to that presence, thereby complying with all axioms. The theoretical violations of ProtoPNet manifest in our experiments on three datasets (CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet, ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a qualitative difference between the explanations given by ProtoPNet and ProtoPFaith. Additionally, we quantify the explanations with the Area Over the Perturbation Curve, on which ProtoPFaith outperforms ProtoPNet on all experiments by a factor $>10^3$.

This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks

Prototypical Self-Explainable Models Without Re-training

Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning

Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

This Looks Better than That: Better Interpretable Models with ProtoPNeXt

ProtGNN: Towards Self-Explaining Graph Neural Networks.

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

ProtoS-ViT: Visual foundation models for sparse self-explainable classifications

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

Improving Network Interpretability via Explanation Consistency Evaluation

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers

CoProNN: Concept-based Prototypical Nearest Neighbors for Explaining Vision Models

P2ExNet: Patch-based Prototype Explanation Network

The Intriguing Properties of Model Explanations

What Does a Model Really Look at?: Extracting Model-Oriented Concepts for Explaining Deep Neural Networks