Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Kerem Zaman,Leshem Choshen,Shashank Srivastava
2024-10-10
Abstract:Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper aims to explore the potential of model fusion in reducing unwanted knowledge (such as shortcuts, social biases, and the memorization of training data) and to verify its effectiveness as a debiasing tool. Specifically, the paper addresses the issue through the following points: 1. **Research Background and Objectives**: - Model fusion is typically used to enhance performance, but how it affects knowledge sharing and forgetting between different models is not yet clear. - The authors hypothesize that shared knowledge will be retained during model fusion, while non-shared knowledge will be forgotten or degraded. 2. **Experimental Design and Methods**: - By using controlled experiments (such as injecting synthetic shortcuts) and real-world datasets (such as PAN16), the study investigates the impact of model fusion on different types of biases and memory. - The BERT model is used for classification tasks, and the changes in knowledge are analyzed through interpolation methods. 3. **Main Findings**: - In synthetic shortcut experiments, model fusion effectively forgets non-shared shortcuts while retaining shared knowledge. - For social biases (gender and age biases), model fusion significantly reduces biases (by about 60%) while maintaining high accuracy. - Regarding the memory capacity of language models (such as GPT-2), model fusion can reduce the memorization of non-shared data, thereby helping to alleviate privacy issues. In summary, this paper validates the potential of model fusion as an effective debiasing tool through a series of experiments and demonstrates its application value in reducing the memorization of training data by models.