Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Kerem Zaman,Leshem Choshen,Shashank Srivastava

2024-10-10

Abstract:Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

This paper aims to explore the potential of model fusion in reducing unwanted knowledge (such as shortcuts, social biases, and the memorization of training data) and to verify its effectiveness as a debiasing tool. Specifically, the paper addresses the issue through the following points: 1. **Research Background and Objectives**: - Model fusion is typically used to enhance performance, but how it affects knowledge sharing and forgetting between different models is not yet clear. - The authors hypothesize that shared knowledge will be retained during model fusion, while non-shared knowledge will be forgotten or degraded. 2. **Experimental Design and Methods**: - By using controlled experiments (such as injecting synthetic shortcuts) and real-world datasets (such as PAN16), the study investigates the impact of model fusion on different types of biases and memory. - The BERT model is used for classification tasks, and the changes in knowledge are analyzed through interpolation methods. 3. **Main Findings**: - In synthetic shortcut experiments, model fusion effectively forgets non-shared shortcuts while retaining shared knowledge. - For social biases (gender and age biases), model fusion significantly reduces biases (by about 60%) while maintaining high accuracy. - Regarding the memory capacity of language models (such as GPT-2), model fusion can reduce the memorization of non-shared data, thereby helping to alleviate privacy issues. In summary, this paper validates the potential of model fusion as an effective debiasing tool through a series of experiments and demonstrates its application value in reducing the memorization of training data by models.

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Memory based fusion for multi-modal deep learning

FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

PrivFusion: Privacy-Preserving Model Fusion Via Decentralized Federated Graph Matching

Fusion Matters: Learning Fusion in Deep Click-through Rate Prediction Models

FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers

Fusing Models with Complementary Expertise

Parameter-efficient Modularised Bias Mitigation via AdapterFusion

Mitigating Memorization In Language Models

SAFE: Machine Unlearning With Shard Graphs

Modifying Memories in Transformer Models

Progressive Fusion for Multimodal Integration

Learn to Forget: Memorization Elimination for Neural Networks.

SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference

Fast Model Debias with Machine Unlearning

MemControl: Mitigating Memorization in Diffusion Models via Automated Parameter Selection

FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion

FuseChat: Knowledge Fusion of Chat Models

Continual Memorization of Factoids in Large Language Models

Cool-Fusion: Fuse Large Language Models without Training