B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

Shreyash Arya,Sukrut Rao,Moritz Böhle,Bernt Schiele
2024-11-02
Abstract:B-cos Networks have been shown to be effective for obtaining highly human interpretable explanations of model decisions by architecturally enforcing stronger alignment between inputs and weight. B-cos variants of convolutional networks (CNNs) and vision transformers (ViTs), which primarily replace linear layers with B-cos transformations, perform competitively to their respective standard variants while also yielding explanations that are faithful by design. However, it has so far been necessary to train these models from scratch, which is increasingly infeasible in the era of large, pre-trained foundation models. In this work, inspired by the architectural similarities in standard DNNs and B-cos networks, we propose 'B-cosification', a novel approach to transform existing pre-trained models to become inherently interpretable. We perform a thorough study of design choices to perform this conversion, both for convolutional neural networks and vision transformers. We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability, while often outperforming them in terms of classification performance at a fraction of the training cost. Subsequently, we apply B-cosification to a pretrained CLIP model, and show that, even with limited data and compute cost, we obtain a B-cosified version that is highly interpretable and competitive on zero shot performance across a variety of datasets. We release our code and pre-trained model weights at <a class="link-external link-https" href="https://github.com/shrebox/B-cosification" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to convert existing pre-trained deep neural networks (DNNs) into models with intrinsic interpretability while maintaining or improving their performance and significantly reducing training costs. Specifically, the authors propose a method called "B-cosification," which fine-tunes existing pre-trained models to endow them with interpretability similar to B-cos models trained from scratch. B-cos models achieve high human interpretability through architectural modifications, such as using B-cos transformations instead of linear layers. However, training these models from scratch requires substantial computational resources and time, especially when dealing with large foundational models. Therefore, the authors explore how to leverage existing pre-trained weights to achieve similar effects through fine-tuning. The main contributions of the paper include: 1. **Proposing the B-cosification technique**: This is a novel technique that fine-tunes existing black-box DNNs into intrinsically interpretable B-cos DNNs, often outperforming standard DNNs and B-cos DNNs in terms of performance. 2. **Detailed study of design choices**: The authors conduct an exhaustive study of different design choices to find the optimal B-cosification strategy. 3. **Application to supervised image classifiers**: The authors apply B-cosification to supervised image classifiers on ImageNet, including CNNs and ViTs. The results show that B-cosified models perform comparably on interpretability metrics and usually outperform standard DNNs and B-cos DNNs in terms of accuracy. 4. **Extension to CLIP models**: The authors also apply B-cosification to pre-trained CLIP models (a large vision-language model). The results indicate that despite using limited data and computational resources, B-cosified CLIP models still exhibit high interpretability and excel in zero-shot performance. In summary, this paper aims to use the B-cosification technique to endow existing pre-trained models with higher interpretability while maintaining high performance, thereby reducing training costs and increasing model transparency.