Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

Le Yu,Bowen Yu,Haiyang Yu,Fei Huang,Yongbin Li

2024-08-06

Abstract:Merging Large Language Models (LLMs) aims to amalgamate multiple homologous LLMs into one with all the capabilities. Ideally, any LLMs sharing the same backbone should be mergeable, irrespective of whether they are Fine-Tuned (FT) with minor parameter changes or Pre-Trained (PT) with substantial parameter shifts. However, existing methods often manually assign the model importance, rendering them feasible only for LLMs with similar parameter alterations, such as multiple FT LLMs. The diverse parameter changed ranges between FT and PT LLMs pose challenges for current solutions in empirically determining the optimal combination. In this paper, we make a pioneering effort to broaden the applicability of merging techniques from FT to PT LLMs. We initially examine the efficacy of current methods in merging FT and PT LLMs, discovering that they struggle to deal with PT LLMs. Subsequently, we introduce an approach based on WeIght DisENtanglement (WIDEN) to effectively extend the merging scope, which first disentangles model weights into magnitude and direction components, and then performs adaptive fusion by considering their respective contributions. In the experiments, we merge Qwen1.5-Chat (an FT LLM with instruction-following skills) with Sailor (a PT LLM with multilingual abilities) across 7B and 14B model scales. Results reveal that: (1) existing solutions usually fail when merging Sailor, either losing both abilities or only retaining instruction-following skills; (2) WIDEN successfully injects the multilingual abilities of Sailor into Qwen1.5-Chat and make it proficient in Southeast Asian languages, achieving enhancements in the fundamental capabilities. In light of previous research, we also merge multiple 13B FT LLMs and observe that WIDEN achieves a balanced amalgamation of instruction following, mathematical reasoning, and code generation skills.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the application of large language model (LLM) merging techniques on pre-trained (PT) models. Specifically: - **Limitations of Existing Methods**: Existing model merging methods are mainly suitable for merging fine-tuned (FT) models, which have relatively small parameter changes compared to the base model. However, when it comes to pre-trained models, due to significant parameter changes, existing merging methods struggle to work effectively. - **Research Contribution**: The paper proposes a new method based on Weight Disentanglement (WIDEN), which can effectively extend the application range of model merging techniques, making them applicable not only to FT models but also to PT models. By decomposing model weights into magnitude and direction components and automatically calculating the importance of each part, WIDEN overcomes the challenges posed by the differences in parameter change ranges between different models. - **Experimental Validation**: The effectiveness of the WIDEN method is validated through merging experiments between the Qwen1.5-Chat model, which has instruction-following skills, and the Sailor model, which has multilingual capabilities. The results show that WIDEN can not only retain the instruction-following ability of Qwen1.5-Chat but also successfully inject the multilingual capabilities of Sailor. In summary, the goal of this paper is to broaden the application range of model merging techniques by proposing the WIDEN method, ensuring that they remain effective even in the presence of pre-trained models.

Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

Unlocking the Potential of Model Merging for Low-Resource Languages

Unconstrained Model Merging for Enhanced LLM Reasoning

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization

Knowledge Fusion of Large Language Models

Language Models Meet World Models: Embodied Experiences Enhance Language Models

MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

MoD: A Distribution-Based Approach for Merging Large Language Models

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models

Knowledge Fusion By Evolving Weights of Language Models

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

ProFuser: Progressive Fusion of Large Language Models

Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild