LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

Ke Wang,Nikolaos Dimitriadis,Alessandro Favero,Guillermo Ortiz-Jimenez,Francois Fleuret,Pascal Frossard
2024-10-23
Abstract:Large pre-trained models exhibit impressive zero-shot performance across diverse tasks, but fine-tuning often leads to catastrophic forgetting, where improvements on a target domain degrade generalization on other tasks. To address this challenge, we introduce LiNeS, Layer-increasing Network Scaling, a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance. LiNeS scales parameter updates linearly based on their layer depth within the network, maintaining shallow layers close to their pre-trained values to preserve general features while allowing deeper layers to retain task-specific representations. We further extend this approach to multi-task model merging scenarios, where layer-wise scaling of merged parameters reduces negative task interference. LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing. It mitigates forgetting, enhances out-of-distribution generalization, integrates seamlessly with existing multi-task model merging baselines improving their performance across benchmarks and model sizes, and can boost generalization when merging LLM policies aligned with different rewards via RLHF. Importantly, our method is simple to implement and complementary to many existing techniques.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of catastrophic forgetting that occurs when fine - tuning large pre - trained models, that is, while optimizing the performance of specific tasks, it leads to a decline in the generalization ability of the model on other tasks. In addition, the paper also explores the problem of task interference when merging multi - task models, which will further reduce the overall performance of the model. ### Problem Background 1. **Catastrophic Forgetting**: When fine - tuning a pre - trained model to adapt to a specific task, although significant improvement can be achieved on the target task, it often impairs the zero - shot generalization ability of the model on other tasks. 2. **Multi - task Model Merging**: When merging multiple models fine - tuned for different tasks into one model, the interference between tasks will lead to performance degradation. ### Solution To solve these problems, the authors propose LiNeS (Layer - increasing Network Scaling), a post - training editing technique that linearly scales parameter updates based on layer depth. Specifically: - **Single - task Fine - tuning**: By reducing the magnitude of the updates in the shallow network, keeping the shallow network close to the pre - trained state, thereby retaining general features; while allowing the deep network to retain task - specific representations. - **Multi - task Model Merging**: When merging models of multiple tasks, by applying layer - by - layer scaling to the merged parameters, the interference between tasks is reduced and the multi - task performance is improved. ### Main Contributions 1. **Proposing the LiNeS Method**: By scaling parameter updates layer - by - layer, it effectively alleviates catastrophic forgetting and shows excellent performance in multiple benchmark tests. 2. **Enhancing Multi - task Model Merging**: Significantly improves the performance of existing multiple multi - task model merging methods, especially in the fields of computer vision and natural language processing. 3. **Simple and Easy to Implement**: The LiNeS method is simple and easy to integrate into existing model merging frameworks and can be widely applied to different models and task settings. ### Experimental Verification The paper verifies the effectiveness of LiNeS through a large number of experiments, including but not limited to: - **Single - task Fine - tuning**: In the image classification task, LiNeS can restore the generalization ability of the pre - trained model while maintaining the performance of the target task. - **Multi - task Model Merging**: On multiple benchmark datasets, LiNeS significantly improves the effect of multi - task model merging and reduces the interference between tasks. - **Robust Fine - tuning**: Combined with the WiSE - FT method, LiNeS further enhances the generalization ability of the model on out - of - distribution data. In conclusion, LiNeS provides a simple and effective solution that can retain task - specific knowledge and maintain the generalization ability of the pre - trained model during fine - tuning and multi - task model merging.