SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

Anke Tang,Li Shen,Yong Luo,Shuai Xie,Han Hu,Lefei Zhang,Bo Du,Dacheng Tao
2024-08-26
Abstract:Deep model training on extensive datasets is increasingly becoming cost-prohibitive, prompting the widespread adoption of deep model fusion techniques to leverage knowledge from pre-existing models. From simple weight averaging to more sophisticated methods like AdaMerging, model fusion effectively improves model performance and accelerates the development of new models. However, potential interference between parameters of individual models and the lack of interpretability in the fusion progress remain significant challenges. Existing methods often try to resolve the parameter interference issue by evaluating attributes of parameters, such as their magnitude or sign, or by parameter pruning. In this study, we begin by examining the fine-tuning of linear layers through the lens of subspace analysis and explicitly define parameter interference as an optimization problem to shed light on this subject. Subsequently, we introduce an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction, which allows for the upscaling of source models into an MoE model without extra data or further training. Our approach relies on the observation that fine-tuning mostly keeps the important parts from the pre-training, but it uses less significant or unused areas to adapt to new tasks. Also, the issue of parameter interference, which is intrinsically intractable in the original parameter space, can be managed by expanding the dimensions. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning, and we apply our method to large language models (CLIP models, Flan-T5 models, and Mistral-7B models), highlighting the adaptability and scalability of SMILE. Code is available at <a class="link-external link-https" href="https://github.com/tanganke/fusion_bench" rel="external noopener nofollow">this https URL</a>
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two key challenges in deep - learning model fusion: 1. **Parameter Interference**: When multiple pre - trained models are fused, interference may occur among the parameters of different models, leading to performance degradation. Existing methods usually solve this problem by evaluating the importance of parameters (such as the size or sign of parameters) or pruning, but the effectiveness of these methods is limited. 2. **Lack of Interpretability**: The process of model fusion often lacks transparency and interpretability, which makes it difficult for researchers to understand how parameters are combined, thereby affecting the trust in the fused model and further improvement. To solve these problems, the author proposes a new model - fusion method - **Zero - shot Sparse Mixture of Low - rank Experts (SMILE)**. Specifically, this method is achieved in the following ways: - **Subspace Analysis**: Through subspace analysis of the linear layer during the fine - tuning process, the prediction is decomposed into independent components of pre - trained knowledge and task - specific adaptation. This helps to understand how the model adapts to downstream tasks while retaining pre - trained knowledge. - **Parameter Interference as an Optimization Problem**: Model the parameter interference problem as an optimization problem, thus providing a more rigorous and measurable perspective. - **SMILE Construction**: Introduce an innovative method to expand the source model into a more general MoE (Mixture of Experts) model without additional data or further training. This method utilizes the important parts retained during the fine - tuning process and uses less important or unused parameter spaces to adapt to new tasks. ### Main Contributions 1. **Subspace Perspective on Fine - tuning**: Provide a novel subspace perspective, revealing how the model adapts to new tasks while retaining pre - trained knowledge, and formalize the parameter interference problem as an optimization problem. 2. **Zero - shot Sparse Mixture of Low - rank Experts (SMILE)**: Propose a new model - fusion method that can fuse existing models into a more general MoE model without using additional data or further training. 3. **Extensive Experiments**: Through experiments on multiple tasks and models (including image classification and text generation tasks), demonstrate the effectiveness and efficiency of the SMILE method, which has significant advantages over existing model - fusion techniques. ### Experimental Results The experimental results show that for the fully fine - tuned model, increasing the parameters by about 50% can achieve 98 - 99% of the performance of 8 individually fine - tuned ViT models; for the LoRA - fine - tuned Flan - T5 model, only 2% of the parameters need to be increased to maintain 99% of the performance. This indicates that the SMILE method provides a good trade - off between performance and model size. Through these contributions, this research provides new insights and effective solutions in the field of deep - learning model fusion, especially in dealing with complex, high - dimensional, and nonlinear model architectures, and solves the problems of parameter interference and interpretability.