MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

Jiahao Huo,Yibo Yan,Boren Hu,Yutao Yue,Xuming Hu
2024-10-02
Abstract:Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechanism of how MLLMs process features from diverse domains. Furthermore, we propose a three-stage mechanism for language model modules in MLLMs when handling projected image features, and verify this hypothesis using logit lens. Extensive experiments indicate that while current MLLMs exhibit Visual Question Answering (VQA) capability, they may not fully utilize domain-specific information. Manipulating domain-specific neurons properly will result in a 10% change of accuracy at most, shedding light on the development of cross-domain, all-encompassing MLLMs in the future. The source code is available at <a class="link-external link-https" href="https://github.com/Z1zs/MMNeuron" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when multimodal large - language models (MLLMs) process features from different visual domains, whether there are also domain - specific neurons (similar to those found in multilingual neuron analysis), and how these neurons affect the model's ability to understand and process image and language instructions. ### Background and Motivation of the Paper 1. **Neuron Analysis**: In recent years, neuron analysis has been widely used in computer vision and natural language processing. By analyzing the activation patterns of neurons, the internal working mechanisms of the model can be explained. Previous studies have confirmed that some neurons play important roles in learning specific concepts, retaining factual knowledge, and solving specific tasks. 2. **Visual Representations in Multimodal Models**: Current multimodal large - language models (such as LLaVA and InstructBLIP) extract image features through pre - trained visual encoders and project them into the word - embedding space. After being combined with language features, they are input into the language model to generate text output. However, the specific mechanism of this framework is still unclear, especially how image features are processed and understood in the language model. 3. **Cross - Domain Multimodal Models**: Although some studies have attempted to enhance the performance of general - domain multimodal models in specific domains through fine - tuning, general - domain models still show certain cross - domain capabilities without further fine - tuning. Therefore, it is of great significance to understand the performance of these models in different domains and their internal mechanisms. ### Research Objectives - **Identify Domain - Specific Neurons**: Identify domain - specific neurons in multimodal large - language models and explore their roles in processing features of different domains. - **Analyze the Influence of Domain - Specific Neurons**: Evaluate the influence of domain - specific neurons on model performance, especially their roles in visual question - answering (VQA) tasks. - **Propose a Multi - stage Mechanism**: Based on the distribution of domain - specific neurons, propose a three - stage mechanism to explain the processing process of multimodal features in the language model. ### Main Contributions - **First Identification of Domain - Specific Neurons in the Multimodal Field**: Domain - specific neurons are identified through the Domain - Activation - Probability - Entropy (DAPE) method. - **Analysis of the Influence of Domain - Specific Neurons**: Experimental results show that current multimodal models do not fully utilize domain - specific information, especially in some specific domains. - **Propose a Three - stage Framework**: Based on the distribution of domain - specific neurons, a multi - stage framework is proposed to explain the processing mechanism of image features in the language model. ### Methods 1. **Neuron Activation Detection**: Define the activation patterns of neurons in the vision - language model, and detect the activation of neurons through the activation function of the feed - forward network (FFN) layer. 2. **Domain - Specific Neuron Selection**: Use the Domain - Activation - Probability - Entropy (DAPE) method to select domain - specific neurons. 3. **Latent Embedding Explanation**: Decode the hidden states of the intermediate layers of the language model through the logit lens method, and observe the transformation process of multimodal features in the language model. ### Experimental Results - **Distribution of Domain - Specific Neurons**: Domain - specific neurons are detected in different modules, and it is found that these neurons are mainly distributed in the shallow and intermediate layers. - **Influence of Domain - Specific Neurons**: Experimental results show that deactivating domain - specific neurons has a greater impact on the performance in some domains, but a smaller impact on other domains. - **Verification of the Three - stage Mechanism**: The proposed three - stage mechanism is verified through the logit lens method, which explains the processing process of multimodal features in the language model. ### Conclusion This study first identifies domain - specific neurons in the multimodal field and proposes a three - stage mechanism to explain the processing process of multimodal features in the language model. These findings are helpful for understanding the internal working mechanisms of multimodal models and provide a new perspective for developing more comprehensive and cross - domain multimodal models in the future.