FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts

Shenghe Zheng,Hongzhi Wang
2024-11-25
Abstract:In the current era of rapid expansion in model scale, there is an increasing availability of open-source model weights for various tasks. However, the capabilities of a single fine-tuned model often fall short of meeting diverse deployment needs. Model merging has thus emerged as a widely focused method for efficiently building a single model tailored for multiple tasks combined from existing models. Nevertheless, existing model merging methods face challenging trade-offs between performance and deployment costs, primarily due to task conflicts within the merged network. Our analysis of neural networks reveals that some task-specific information introduced by fine-tuning minimally enhances performance but heavily impacts generalization, leading to task conflicts. To mitigate the impact of this information, we propose FR-Merging, an innovative method that leverages frequency domain information to efficiently filter harmful specialized information, thereby minimizing the impact of task conflicts on the backbone with minimal cost. Since performance loss is inevitable with cost-free merging methods, we introduce a lightweight task-specific expert that can be dynamically integrated during inference to compensate for information loss. This framework, FREE-Merging (FR-Merging with lightweight experts), strikes a balanced trade-off between training cost, inference speed, storage requirements, and performance. We demonstrate the effectiveness of both FR-Merging and FREE-Merging on multiple tasks across CV, NLP, and Multi-Modal domains and show that they can be flexibly adapted to meet specific needs.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in multi - task model merging: 1. **Insufficient ability of a single fine - tuned model**: A single fine - tuned model is often unable to meet diverse deployment requirements, especially when multiple tasks need to be processed. 2. **Trade - off between the performance and deployment cost of existing model merging methods**: Existing model merging methods have a difficult trade - off between performance and deployment cost, mainly due to task conflicts within the merged network. These conflicts lead to a decline in the model's performance when handling different tasks. 3. **The impact of task - specific information on generalization ability**: Analyses show that although task - specific information introduced by fine - tuning can slightly improve performance, it seriously affects the model's generalization ability, thus causing task conflicts. To solve these problems, the authors propose the **FREE - Merging** framework, which specifically includes two main parts: - **FR - Merging**: Use Fourier transform to filter harmful specialized information in the frequency domain, reducing the impact of task conflicts on the backbone network at the minimum cost. By high - pass filtering to remove low - frequency signals and retain high - frequency signals, the model's generalization ability is improved. - **Lightweight Experts**: Dynamically integrate lightweight task experts during the inference process to compensate for information loss and maintain task - specific capabilities. This enables the model to flexibly adapt to multiple task requirements without affecting performance. ### Summary The main contributions of this paper are: 1. Discovering the relationship between the low - frequency region of fine - tuned parameters and the model's generalization ability, which helps reduce task conflicts and is of great significance in deep learning. 2. Theoretically verifying the necessity of introducing new information in model merging, and proposing an effective lightweight expert construction method to make up for information loss during the merging process. 3. Through extensive experiments, verifying the effectiveness of the proposed FR - Merging and FREE - Merging methods in various tasks in the visual, language, and multi - modal fields. This method not only improves the model's generalization ability and performance but also achieves a good balance between storage, inference speed, and performance, and is suitable for various application scenarios such as edge devices.