A Concept-Based Explainability Framework for Large Multimodal Models

Jayneel Parekh,Pegah Khayatan,Mustafa Shukor,Alasdair Newson,Matthieu Cord
2024-06-12
Abstract:Large multimodal models (LMMs) combine unimodal encoders and large language models (LLMs) to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs remains largely a mystery. In this paper, we present a novel framework for the interpretation of LMMs. We propose a dictionary learning based approach, applied to the representation of tokens. The elements of the learned dictionary correspond to our proposed concepts. We show that these concepts are well semantically grounded in both vision and text. Thus we refer to these as "multi-modal concepts". We qualitatively and quantitatively evaluate the results of the learnt concepts. We show that the extracted multimodal concepts are useful to interpret representations of test samples. Finally, we evaluate the disentanglement between different concepts and the quality of grounding concepts visually and textually. We will publicly release our code.
Machine Learning,Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to understand and interpret the internal representations of large - scale multimodal models (LMMs)**. ### Problem Background In recent years, large - scale multimodal models (LMMs), which combine unimodal encoders and large - language models (LLMs), have made remarkable progress in handling multimodal tasks. However, despite the excellent performance of these models, their internal representations are still difficult to understand. This not only limits the interpretability of the models but also affects their reliability and credibility. ### Paper Objectives To fill this gap, this paper proposes a new concept - based interpretability framework, aiming to help researchers better understand the internal representations of LMMs. Specifically, the objectives of the paper include: 1. **Propose a novel concept extraction method**: This method is based on dictionary learning and is applied to the representations of tokens in LMMs. Through this method, a multimodal concept dictionary can be learned, where each concept can be semantically grounded in the visual and textual domains. 2. **Verify the effectiveness of multimodal concepts**: Through qualitative and quantitative evaluations, prove that the extracted multimodal concepts can effectively explain the representations of test samples and have good disentanglement ability and high - quality visual and textual grounding. 3. **Provide public code**: To promote follow - up research, the author will publicly release the code implementing this method. ### Main Contributions - **Propose a concept - interpretation framework for large - scale multimodal models for the first time**: As far as the author knows, this is the first attempt to interpret such large - scale multimodal models. - **Introduce the Semi - Non - negative Matrix Factorization (Semi - NMF) optimization strategy**: Expand the previous concept - dictionary - learning strategy and propose a new optimization method. - **Verify the effectiveness of multimodal concepts through experiments**: Through qualitative and quantitative evaluations, prove that the learned concept dictionary has meaningful multimodal grounding and can effectively explain the representations of test samples. ### Method Overview The method of the paper mainly includes three steps: 1. **Select relevant images**: Select images related to the target tokens from the dataset and extract the internal representations of these images. 2. **Linearly decompose the representation matrix**: Use the dictionary - learning method to decompose the representation matrix into a concept dictionary and an activation - coefficient matrix. 3. **Semantic grounding**: Semantically ground the learned concepts in the visual and textual domains to ensure that they have practical meanings. Through these steps, the paper successfully reveals the multimodal structure of the internal representations of LMMs and provides a new perspective for understanding these complex models.