Abstract:Large pre-trained models exhibit impressive zero-shot performance across diverse tasks, but fine-tuning often leads to catastrophic forgetting, where improvements on a target domain degrade generalization on other tasks. To address this challenge, we introduce LiNeS, Layer-increasing Network Scaling, a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance. LiNeS scales parameter updates linearly based on their layer depth within the network, maintaining shallow layers close to their pre-trained values to preserve general features while allowing deeper layers to retain task-specific representations. We further extend this approach to multi-task model merging scenarios, where layer-wise scaling of merged parameters reduces negative task interference. LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing. It mitigates forgetting, enhances out-of-distribution generalization, integrates seamlessly with existing multi-task model merging baselines improving their performance across benchmarks and model sizes, and can boost generalization when merging LLM policies aligned with different rewards via RLHF. Importantly, our method is simple to implement and complementary to many existing techniques.

What problem does this paper attempt to address?

This paper attempts to solve the problem of catastrophic forgetting that occurs when fine - tuning large pre - trained models, that is, while optimizing the performance of specific tasks, it leads to a decline in the generalization ability of the model on other tasks. In addition, the paper also explores the problem of task interference when merging multi - task models, which will further reduce the overall performance of the model. ### Problem Background 1. **Catastrophic Forgetting**: When fine - tuning a pre - trained model to adapt to a specific task, although significant improvement can be achieved on the target task, it often impairs the zero - shot generalization ability of the model on other tasks. 2. **Multi - task Model Merging**: When merging multiple models fine - tuned for different tasks into one model, the interference between tasks will lead to performance degradation. ### Solution To solve these problems, the authors propose LiNeS (Layer - increasing Network Scaling), a post - training editing technique that linearly scales parameter updates based on layer depth. Specifically: - **Single - task Fine - tuning**: By reducing the magnitude of the updates in the shallow network, keeping the shallow network close to the pre - trained state, thereby retaining general features; while allowing the deep network to retain task - specific representations. - **Multi - task Model Merging**: When merging models of multiple tasks, by applying layer - by - layer scaling to the merged parameters, the interference between tasks is reduced and the multi - task performance is improved. ### Main Contributions 1. **Proposing the LiNeS Method**: By scaling parameter updates layer - by - layer, it effectively alleviates catastrophic forgetting and shows excellent performance in multiple benchmark tests. 2. **Enhancing Multi - task Model Merging**: Significantly improves the performance of existing multiple multi - task model merging methods, especially in the fields of computer vision and natural language processing. 3. **Simple and Easy to Implement**: The LiNeS method is simple and easy to integrate into existing model merging frameworks and can be widely applied to different models and task settings. ### Experimental Verification The paper verifies the effectiveness of LiNeS through a large number of experiments, including but not limited to: - **Single - task Fine - tuning**: In the image classification task, LiNeS can restore the generalization ability of the pre - trained model while maintaining the performance of the target task. - **Multi - task Model Merging**: On multiple benchmark datasets, LiNeS significantly improves the effect of multi - task model merging and reduces the interference between tasks. - **Robust Fine - tuning**: Combined with the WiSE - FT method, LiNeS further enhances the generalization ability of the model on out - of - distribution data. In conclusion, LiNeS provides a simple and effective solution that can retain task - specific knowledge and maintain the generalization ability of the pre - trained model during fine - tuning and multi - task model merging.

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

Overcoming Long-Term Catastrophic Forgetting Through Adversarial Neural Pruning and Synaptic Consolidation

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

Learning with Recoverable Forgetting

Model Behavior Preserving for Class-Incremental Learning

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

Overcoming catastrophic forgetting in neural networks

Linear Chain Transformation: Expanding Optimization Dynamics for Fine-Tuning Large Language Models

Simulated Annealing in Early Layers Leads to Better Generalization

Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

Reusing Pretrained Models by Multi-linear Operators for Efficient Training

Intra-Model Collaborative Learning of Neural Networks.

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Complementary Learning Subnetworks for Parameter-Efficient Class-Incremental Learning

An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates

Rethinking Residual Connection with Layer Normalization

Neural Network Retraining for Model Serving

Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Memory Recall: A Simple Neural Network Training Framework Against Catastrophic Forgetting