Can We Debias Multimodal Large Language Models Via Model Editing?

Zecheng Wang,Xinye Li,Zhanyue Qin,Chunshan Li,Zhiying Tu,Dianhui Chu,Dianbo Sui
DOI: https://doi.org/10.1145/3664647.3681589
2024-01-01
Abstract:Multimodal large language models (MLLM) have been observed to exhibit biases originating from their training datasets. Unlike unimodal LLMs, biases in MLLMs may stem from interactions between multiple modalities, which increases the complexity of multimodal debiasing. Conventional approaches like fine-tuning to alleviate biases in models are costly and data-hungry. Model editing methods, which focus on post-hoc modifications of model knowledge, have recently demonstrated significant potential across diverse applications. These methods can effectively and precisely adjust the behavior of models in specific knowledge domains, while minimizing the impact on the overall performance of the model. However, there is currently no comprehensive study to drive the application of model editing methods in debiasing MLLM and to analyze its pros and cons. To facilitate research in this field, we define the debiasing problem of MLLM as an editing problem and propose a novel set of evaluation metrics for MLLM debias editing. Through various experiments, we demonstrate that: (1) Existing model editing methods can effectively alleviate biases in MLLM and can generalize well to semantically equivalent image-text pairs. However, most methods tend to adversely affect the stability of the MLLM. (2) Compared to editing the visual modality of the MLLM, editing the textual modality yields better results in addressing MLLM biases. (3) Model editing based debiasing method can achieve generalization across different types of biases.
What problem does this paper attempt to address?