Quantitative Attributions with Counterfactuals

Diane-Yayra Adjavon,Nils Eckstein,Alexander S. Bates,Gregory S.X.E. Jefferis,Jan Funke
DOI: https://doi.org/10.1101/2024.11.26.625505
2024-12-02
Abstract:We address the problem of explaining the decision process of deep neural network classifiers on images, which is of particular importance in biomedical datasets where class-relevant differences are not always obvious to a human observer. Our proposed solution, termed quantitative attribution with counterfactuals (QuAC), generates visual explanations that highlight class-relevant differences by attributing the classifier decision to changes of visual features in small parts of an image. To that end, we train a separate network to generate counterfactual images (i.e., to translate images between different classes). We then find the most important differences using novel discriminative attribution methods. Crucially, QuAC allows scoring of the attribution and thus provides a measure to quantify and compare the fidelity of a visual explanation. We demonstrate the suitability and limitations of QuAC on two datasets: (1) a synthetic dataset with known class differences, representing different levels of protein aggregation in cells and (2) an electron microscopy dataset of D. melanogaster synapses with different neurotransmitters, where QuAC reveals so far unknown visual differences. We further discuss how QuAC can be used to interrogate mispredictions to shed light on unexpected inter-class similarities and intra-class differences.
Bioinformatics
What problem does this paper attempt to address?