Reproducibility study of "LICO: Explainable Models with Language-Image Consistency"

Luan Fletcher,Robert van der Klis,Martin Sedláček,Stefan Vasilev,Christos Athanasiadis
2024-10-18
Abstract:The growing reproducibility crisis in machine learning has brought forward a need for careful examination of research findings. This paper investigates the claims made by Lei et al. (2023) regarding their proposed method, LICO, for enhancing post-hoc interpretability techniques and improving image classification performance. LICO leverages natural language supervision from a vision-language model to enrich feature representations and guide the learning process. We conduct a comprehensive reproducibility study, employing (Wide) ResNets and established interpretability methods like Grad-CAM and RISE. We were mostly unable to reproduce the authors' results. In particular, we did not find that LICO consistently led to improved classification performance or improvements in quantitative and qualitative measures of interpretability. Thus, our findings highlight the importance of rigorous evaluation and transparent reporting in interpretability research.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the reproducibility crisis in the field of machine learning, especially to verify the reproducibility and effectiveness of the LICO method proposed by Lei et al. (2023). Specifically, researchers hope to answer the following questions: 1. **Can LICO improve model interpretability?** The LICO method claims to enhance the posterior explanation techniques of image classification models by combining natural - language supervision, thereby improving interpretability. This study aims to verify whether this claim is valid. 2. **Can LICO improve classification performance?** The LICO method also claims to be able to improve the classification accuracy in image classification tasks. Researchers hope to verify this through experiments. 3. **Performance of LICO in multi - category tasks**: Does the LICO method show greater advantages in tasks with more classification categories, because the prompt guidance comes from manifolds containing more categories? ### Research Background As the reproducibility crisis in the field of machine learning intensifies, researchers are increasingly attaching importance to the reproduction and verification of existing research results. The LICO method proposes a new training strategy to improve the interpretability and classification performance of image classification models by adjusting the feature space using natural - language supervision provided by vision - language models (such as CLIP). However, these claimed effects need to be verified through independent experiments. ### Main Research Contents To verify the effectiveness of the LICO method, researchers have carried out the following work: - **Dataset selection**: Datasets such as CIFAR - 10, CIFAR - 100, Imagenette and ImageNet were used to ensure the wide applicability of the experimental results. - **Experimental setup**: ResNets and Wide ResNets were used as base models, and explanation methods such as Grad - CAM and RISE were used to generate saliency maps. In addition, metrics such as Insertion/Deletion scores and Intersection over Union (IoU) were introduced to evaluate the interpretability and classification performance of the models. - **Loss function analysis**: An ablation study was carried out on the two loss functions (Manifold Matching Loss and Optimal Transport Loss) in the LICO method to understand their respective roles and the combined effect. ### Experimental Results After extensive experiments, researchers have drawn the following conclusions: - **In terms of interpretability**: The LICO method did not significantly improve the interpretability of the model, either quantitatively or qualitatively. The quality of the saliency maps did not improve significantly and even decreased in some cases. - **In terms of classification performance**: The LICO method instead reduced the classification performance, especially when the amount of data was limited, and this decline was more obvious. - **Performance in multi - category tasks**: The LICO method did not show better performance in tasks with more classification categories, which contradicts the claims of the original authors. ### Conclusion Overall, this study fails to support the effectiveness of the LICO method in improving model interpretability and classification performance. Researchers emphasize the importance of strict evaluation and transparent reporting in interpretability research and point out some un - described operations or parameter settings that may exist in the original authors' experiments, which may be one of the reasons for the differences in results. Through the research in this paper, we can see that in the field of machine learning, especially in interpretability research, it is very important to ensure the reproducibility of experiments and the reliability of results.