Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Haowen Pan,Yixin Cao,Xiaozhi Wang,Xun Yang,Meng Wang
2024-06-11
Abstract:Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability -- how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of our methods, but also offer insightful findings that highlight three key properties of multi-modal neurons: sensitivity, specificity and causal-effect, to shed light for future research.
Computation and Language
What problem does this paper attempt to address?
The paper proposes a new approach to identify and manipulate multimodal neurons that play a crucial role in pretrained Transformer-based multimodal language models. The researchers found that multimodal neurons are essential for understanding images and generating textual descriptions, but their identification process is inefficient and their applicability is limited. To address this issue, they define a contribution score based on activation outputs to determine the extent to which the neurons contribute to specific concepts. This method improves efficiency as it does not require gradient calculation. Based on the identified neurons, the paper also presents a multimodal knowledge editing approach that allows for editing specific concepts in the model parameters without retraining the entire model. The main contributions of the paper are as follows: 1. Introducing a new approach to identify multimodal neurons in Transformers. 2. Designing a multimodal knowledge editing method based on these neurons to control model outputs. 3. Experimentally revealing three critical properties of multimodal neurons: sensitivity, specificity, and causality effect, and designing corresponding evaluation metrics. In the experiments, the researchers used several widely-used visual semantic understanding models and conducted experiments on the SBU Captions dataset to validate the effectiveness of the proposed methods. The results demonstrate that their approach can accurately identify neurons associated with semantic concepts and these neurons exhibit invariance across different regions and images, indicating their sensitivity and specificity to specific concepts.