Deep Model Fusion: A Survey

Weishi Li,Yong Peng,Miao Zhang,Liang Ding,Han Hu,Li Shen
DOI: https://doi.org/10.48550/arXiv.2309.15698
2023-09-27
Abstract:Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single model to achieve better performance. However, deep model fusion on large-scale deep learning models (e.g., LLMs and foundation models) faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc. Although model fusion has attracted widespread attention due to its potential to solve complex real-world tasks, there is still a lack of complete and detailed survey research on this technique. Accordingly, in order to understand the model fusion method better and promote its development, we present a comprehensive survey to summarize the recent progress. Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model. In addition, we analyze the challenges faced by deep model fusion and propose possible research directions for model fusion in the future. Our review is helpful in deeply understanding the correlation between different model fusion methods and practical application methods, which can enlighten the research in the field of deep model fusion.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the summary of challenges and methods in deep model fusion. Specifically, the paper focuses on how to combine the parameters or prediction results of multiple deep - learning models into a single model, in order to compensate for the bias and errors of a single model and thus achieve better performance. However, model fusion in large - scale deep - learning models (such as large language models LLMs and foundation models) faces many challenges, such as high computational cost, high - dimensional parameter space, and interference between different heterogeneous models. To solve these problems, the paper has carried out the following work: 1. **Classify and review existing methods**: The paper divides existing deep - model - fusion methods into four categories: - **Mode connectivity**: Obtain better initialization conditions by connecting different solutions in the weight space. - **Alignment**: Match the units between neural networks to create better conditions for fusion. - **Weight average**: Directly average the weights of multiple models to obtain results closer to the optimal solution. - **Ensemble learning**: Combine the outputs of different models to improve the accuracy and robustness of the final model. 2. **Analyze challenges and propose future research directions**: The paper analyzes in detail the challenges faced by deep - model - fusion and proposes possible research directions. For example, how to reduce computational cost, handle model heterogeneity, and accelerate the speed of combinatorial optimization. 3. **Provide comprehensive theoretical and technical guidance**: By explaining the mechanisms and relationships of different model - fusion methods, the paper provides inspiration for designing more advanced model - fusion methods and guidance for improving the generalization ability and accuracy training of deep neural networks. In conclusion, this paper aims to promote the further development of this field by systematically summarizing and analyzing the methods and techniques of deep - model - fusion. This not only helps to understand the relationships between different model - fusion methods, but also provides valuable references for practical applications.