Abstract:The debate between self-interpretable models and post-hoc explanations for black-box models is central to Explainable AI (XAI). Self-interpretable models, such as concept-based networks, offer insights by connecting decisions to human-understandable concepts but often struggle with performance and scalability. Conversely, post-hoc methods like Shapley values, while theoretically robust, are computationally expensive and resource-intensive. To bridge the gap between these two lines of research, we propose a novel method that combines their strengths, providing theoretically guaranteed self-interpretability for black-box models without compromising prediction accuracy. Specifically, we introduce a parameter-efficient pipeline, *AutoGnothi*, which integrates a small side network into the black-box model, allowing it to generate Shapley value explanations without changing the original network parameters. This side-tuning approach significantly reduces memory, training, and inference costs, outperforming traditional parameter-efficient methods, where full fine-tuning serves as the optimal baseline. *AutoGnothi* enables the black-box model to predict and explain its predictions with minimal overhead. Extensive experiments show that *AutoGnothi* offers accurate explanations for both vision and language tasks, delivering superior computational efficiency with comparable interpretability.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is to improve the self - interpretability of black - box models (self - interpretable), without sacrificing prediction performance, and significantly reduce training, memory, and inference costs. Specifically, the author aims to bridge the gap between self - explanatory models and post - hoc explanation methods, and proposes a new method that enables black - box models to theoretically guarantee self - interpretability without changing the original network parameters. ### Problem Background In the field of Explainable Artificial Intelligence (XAI), there are mainly two methods: 1. **Self - explanatory models**: For example, concept - based networks, which provide insights by connecting decisions with human - understandable concepts, but usually perform poorly in terms of performance and scalability. 2. **Post - hoc explanation methods**: For example, Shapley values, which have a solid theoretical foundation but are computationally complex and resource - intensive. ### Proposed Method To solve the above problems, the author proposes a new method named AutoGnothi. AutoGnothi generates Shapley value explanations by introducing a small side network and integrating it into the black - box model. The main advantages of this method include: - **Parameter - efficient**: It can achieve the explanation function with only a small number of additional parameters, greatly reducing memory and computational costs. - **Maintaining prediction performance**: It achieves self - interpretability without affecting the prediction accuracy of the original model. - **Applicable to multiple tasks**: Experiments show that AutoGnothi performs well in both visual and language tasks and has broad application prospects. ### Key Contributions 1. **Efficient explanation**: Through the Parameter - Efficient Transfer Learning (PETL) pipeline, AutoGnothi makes any black - box model (such as Transformer) self - explanatory without affecting the original task parameters. Compared with existing methods, it has significant improvements in training, inference, and memory efficiency. 2. **Self - interpretability**: It achieves theoretically guaranteed self - interpretability through Shapley values without affecting the prediction accuracy of the original model. 3. **Widely applicable to visual and language models**: Experiments on commonly used models such as ViT (for image classification) and BERT (for sentiment analysis) show that AutoGnothi has significant advantages in explanation quality. For example, on the ImageNette dataset, AutoGnothi reduces the trainable parameters by 97% and the training memory by 72%, while maintaining comparable accuracy. ### Method Overview The core idea of AutoGnothi is to reduce training and memory costs through side - tuning. The specific steps are as follows: 1. **Obtain proxy model**: Directly apply side - tuning to the black - box model to predict masked inputs using additional side branches. 2. **Obtain interpreter model**: Use a similar side - tuning feature backbone and add additional fully - connected layers as an explanation head to generate explanations. 3. **Generate explanations**: Finally, AutoGnothi can generate predictions and explanations simultaneously in a single inference, significantly improving inference efficiency. Through these innovations, AutoGnothi not only improves the interpretability of black - box models but also significantly reduces the consumption of computational resources, making it more feasible in practical applications.

Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models

Solving the enigma: Deriving optimal explanations of deep networks

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks

Improving Network Interpretability via Explanation Consistency Evaluation

Explainable AI for Cheating Detection and Churn Prediction in Online Games

ProtGNN: Towards Self-Explaining Graph Neural Networks.

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

Scalable Partial Explainability in Neural Networks via Flexible Activation Functions

Gradient based Feature Attribution in Explainable AI: A Technical Review

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions

Interpretable Deep Learning Models: Enhancing Transparency and Trustworthiness in Explainable AI

Interpretable Prototype-based Graph Information Bottleneck

Leveraging saliency priors and explanations for enhanced consistent interpretability

GAMI-Net: An Explainable Neural Network based on Generalized Additive Models with Structured Interactions

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

XInsight: Revealing Model Insights for GNNs with Flow-based Explanations

Foiling Explanations in Deep Neural Networks