Abstract:Deep model training on extensive datasets is increasingly becoming cost-prohibitive, prompting the widespread adoption of deep model fusion techniques to leverage knowledge from pre-existing models. From simple weight averaging to more sophisticated methods like AdaMerging, model fusion effectively improves model performance and accelerates the development of new models. However, potential interference between parameters of individual models and the lack of interpretability in the fusion progress remain significant challenges. Existing methods often try to resolve the parameter interference issue by evaluating attributes of parameters, such as their magnitude or sign, or by parameter pruning. In this study, we begin by examining the fine-tuning of linear layers through the lens of subspace analysis and explicitly define parameter interference as an optimization problem to shed light on this subject. Subsequently, we introduce an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction, which allows for the upscaling of source models into an MoE model without extra data or further training. Our approach relies on the observation that fine-tuning mostly keeps the important parts from the pre-training, but it uses less significant or unused areas to adapt to new tasks. Also, the issue of parameter interference, which is intrinsically intractable in the original parameter space, can be managed by expanding the dimensions. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning, and we apply our method to large language models (CLIP models, Flan-T5 models, and Mistral-7B models), highlighting the adaptability and scalability of SMILE. Code is available at <a class="link-external link-https" href="https://github.com/tanganke/fusion_bench" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two key challenges in deep - learning model fusion: 1. **Parameter Interference**: When multiple pre - trained models are fused, interference may occur among the parameters of different models, leading to performance degradation. Existing methods usually solve this problem by evaluating the importance of parameters (such as the size or sign of parameters) or pruning, but the effectiveness of these methods is limited. 2. **Lack of Interpretability**: The process of model fusion often lacks transparency and interpretability, which makes it difficult for researchers to understand how parameters are combined, thereby affecting the trust in the fused model and further improvement. To solve these problems, the author proposes a new model - fusion method - **Zero - shot Sparse Mixture of Low - rank Experts (SMILE)**. Specifically, this method is achieved in the following ways: - **Subspace Analysis**: Through subspace analysis of the linear layer during the fine - tuning process, the prediction is decomposed into independent components of pre - trained knowledge and task - specific adaptation. This helps to understand how the model adapts to downstream tasks while retaining pre - trained knowledge. - **Parameter Interference as an Optimization Problem**: Model the parameter interference problem as an optimization problem, thus providing a more rigorous and measurable perspective. - **SMILE Construction**: Introduce an innovative method to expand the source model into a more general MoE (Mixture of Experts) model without additional data or further training. This method utilizes the important parts retained during the fine - tuning process and uses less important or unused parameter spaces to adapt to new tasks. ### Main Contributions 1. **Subspace Perspective on Fine - tuning**: Provide a novel subspace perspective, revealing how the model adapts to new tasks while retaining pre - trained knowledge, and formalize the parameter interference problem as an optimization problem. 2. **Zero - shot Sparse Mixture of Low - rank Experts (SMILE)**: Propose a new model - fusion method that can fuse existing models into a more general MoE model without using additional data or further training. 3. **Extensive Experiments**: Through experiments on multiple tasks and models (including image classification and text generation tasks), demonstrate the effectiveness and efficiency of the SMILE method, which has significant advantages over existing model - fusion techniques. ### Experimental Results The experimental results show that for the fully fine - tuned model, increasing the parameters by about 50% can achieve 98 - 99% of the performance of 8 individually fine - tuned ViT models; for the LoRA - fine - tuned Flan - T5 model, only 2% of the parameters need to be increased to maintain 99% of the performance. This indicates that the SMILE method provides a good trade - off between performance and model size. Through these contributions, this research provides new insights and effective solutions in the field of deep - learning model fusion, especially in dealing with complex, high - dimensional, and nonlinear model architectures, and solves the problems of parameter interference and interpretability.

SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

Parameter Efficient Multi-task Model Fusion with Partial Linearization

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

FusionBench: A Comprehensive Benchmark of Deep Model Fusion

Deep Model Fusion: A Survey

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

Tutel: Adaptive Mixture-of-Experts at Scale

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

Cool-Fusion: Fuse Large Language Models without Training

ProFuser: Progressive Fusion of Large Language Models

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic

Fusing Models with Complementary Expertise

Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning.

Sparse Fusion for Multimodal Transformers

A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts