Knowledge Fusion By Evolving Weights of Language Models

Guodong Du,Jing Li,Hanting Liu,Runhua Jiang,Shuyang Yu,Yifei Guo,Sim Kuan Goh,Ho-Kin Tang
2024-06-18
Abstract:Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {<a class="link-external link-https" href="https://github.com/duguodong7/model-evolution" rel="external noopener nofollow">this https URL</a>}.
Computation and Language,Artificial Intelligence,Computer Vision and Pattern Recognition,Neural and Evolutionary Computing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems encountered during the fine - tuning process of large - scale language models: 1. **Performance differences across domains and tasks**: Fine - tuning the same model in different task scenarios may lead to performance changes, that is, the results may be unsatisfactory when testing the same model in different contexts. Therefore, the author hopes to integrate model knowledge from different training scenarios to improve the performance of the model in cross - domain or cross - task scenarios. 2. **Reducing the need for additional training**: Traditional methods such as multi - task learning require a large amount of labeled data and a complex training process, while methods such as federated learning face data privacy issues. This paper proposes a knowledge fusion method that does not require further training or additional training data. 3. **Optimizing model merging**: Redefine the model merging problem as an optimization problem, aiming to find the information most conducive to knowledge fusion, so as to achieve better results than a single model. Specifically, the author introduces a novel model evolution method (Evolver) inspired by evolutionary algorithms, which generates new models through mutation and crossover operations and selects models with better performance for updating. ### Main contributions of the paper - **Innovative model evolution method**: Propose a new knowledge fusion method from an evolutionary perspective, which is achieved by evolving the weights of the language model. - **Improved knowledge fusion performance**: This method consistently improves the performance of knowledge fusion in a wide range of experimental settings. - **Effective integration with existing model merging methods**: The proposed method can be effectively combined with existing model merging techniques to further improve the performance of knowledge fusion and is significantly better than the baseline method and previous techniques. - **Superior generalization ability**: Performs well on unseen data domains and can handle new data not encountered during training. ### Method overview The method proposed in the paper mainly includes the following steps: 1. **Initializing the population**: Fine - tune models of the same pre - trained checkpoint from different environments to form an initial population. 2. **Evolution process**: - **Mutation**: Randomly select two candidate individuals and use a scaling factor \( F \) to adjust the difference between them to generate a mutant solution. - **Crossover**: Determine the element selection between new individuals and parent individuals according to the crossover ratio \( Cr \). - **Update**: Evaluate the performance of the offspring individuals and compare them with the parent individuals. If the offspring perform better, replace the parent individuals. 3. **Integration with other merging methods**: Other model merging techniques can be applied in the process of obtaining the final evolved model or calculating the updated population score to further improve performance. ### Experimental results The experimental results show that Evolver outperforms existing knowledge fusion methods on multiple data domains and tasks and performs well in cross - domain generalization. In addition, the model evolution method also shows more stable performance when dealing with classification heads under different initial conditions. ### Summary By optimizing the weights of the language model through evolutionary algorithms, this research provides a new way to improve the performance of the model without additional training and shows its strong potential in cross - domain and cross - task scenarios.