Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models

Shaobo Wang,Hongxuan Tang,Mingyang Wang,Hongrui Zhang,Xuyang Liu,Weiya Li,Xuming Hu,Linfeng Zhang
2024-10-29
Abstract:The debate between self-interpretable models and post-hoc explanations for black-box models is central to Explainable AI (XAI). Self-interpretable models, such as concept-based networks, offer insights by connecting decisions to human-understandable concepts but often struggle with performance and scalability. Conversely, post-hoc methods like Shapley values, while theoretically robust, are computationally expensive and resource-intensive. To bridge the gap between these two lines of research, we propose a novel method that combines their strengths, providing theoretically guaranteed self-interpretability for black-box models without compromising prediction accuracy. Specifically, we introduce a parameter-efficient pipeline, *AutoGnothi*, which integrates a small side network into the black-box model, allowing it to generate Shapley value explanations without changing the original network parameters. This side-tuning approach significantly reduces memory, training, and inference costs, outperforming traditional parameter-efficient methods, where full fine-tuning serves as the optimal baseline. *AutoGnothi* enables the black-box model to predict and explain its predictions with minimal overhead. Extensive experiments show that *AutoGnothi* offers accurate explanations for both vision and language tasks, delivering superior computational efficiency with comparable interpretability.
Machine Learning,Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition,Computer Science and Game Theory
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to improve the self - interpretability of black - box models (self - interpretable), without sacrificing prediction performance, and significantly reduce training, memory, and inference costs. Specifically, the author aims to bridge the gap between self - explanatory models and post - hoc explanation methods, and proposes a new method that enables black - box models to theoretically guarantee self - interpretability without changing the original network parameters. ### Problem Background In the field of Explainable Artificial Intelligence (XAI), there are mainly two methods: 1. **Self - explanatory models**: For example, concept - based networks, which provide insights by connecting decisions with human - understandable concepts, but usually perform poorly in terms of performance and scalability. 2. **Post - hoc explanation methods**: For example, Shapley values, which have a solid theoretical foundation but are computationally complex and resource - intensive. ### Proposed Method To solve the above problems, the author proposes a new method named AutoGnothi. AutoGnothi generates Shapley value explanations by introducing a small side network and integrating it into the black - box model. The main advantages of this method include: - **Parameter - efficient**: It can achieve the explanation function with only a small number of additional parameters, greatly reducing memory and computational costs. - **Maintaining prediction performance**: It achieves self - interpretability without affecting the prediction accuracy of the original model. - **Applicable to multiple tasks**: Experiments show that AutoGnothi performs well in both visual and language tasks and has broad application prospects. ### Key Contributions 1. **Efficient explanation**: Through the Parameter - Efficient Transfer Learning (PETL) pipeline, AutoGnothi makes any black - box model (such as Transformer) self - explanatory without affecting the original task parameters. Compared with existing methods, it has significant improvements in training, inference, and memory efficiency. 2. **Self - interpretability**: It achieves theoretically guaranteed self - interpretability through Shapley values without affecting the prediction accuracy of the original model. 3. **Widely applicable to visual and language models**: Experiments on commonly used models such as ViT (for image classification) and BERT (for sentiment analysis) show that AutoGnothi has significant advantages in explanation quality. For example, on the ImageNette dataset, AutoGnothi reduces the trainable parameters by 97% and the training memory by 72%, while maintaining comparable accuracy. ### Method Overview The core idea of AutoGnothi is to reduce training and memory costs through side - tuning. The specific steps are as follows: 1. **Obtain proxy model**: Directly apply side - tuning to the black - box model to predict masked inputs using additional side branches. 2. **Obtain interpreter model**: Use a similar side - tuning feature backbone and add additional fully - connected layers as an explanation head to generate explanations. 3. **Generate explanations**: Finally, AutoGnothi can generate predictions and explanations simultaneously in a single inference, significantly improving inference efficiency. Through these innovations, AutoGnothi not only improves the interpretability of black - box models but also significantly reduces the consumption of computational resources, making it more feasible in practical applications.