EvoMerge: Neuroevolution for Large Language Models

Yushu Jiang
2024-01-31
Abstract:Extensive fine-tuning on Large Language Models does not always yield better results. Oftentimes, models tend to get better at imitating one form of data without gaining greater reasoning ability and may even end up losing some intelligence. Here I introduce EvoMerge, a systematic approach to large language model training and merging. Leveraging model merging for weight crossover and fine-tuning for weight mutation, EvoMerge establishes an evolutionary process aimed at pushing models beyond the limits of conventional fine-tuning.
Neural and Evolutionary Computing,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Large - scale language models (LLMs) do not always achieve better performance through extensive fine - tuning. Sometimes, the model may become better at imitating a certain type of data without enhancing its reasoning ability, and may even lose some of its original intelligence. For this reason, the author proposes the EvoMerge method, which is a systematic method for training and merging large - scale language models based on neuroevolution. EvoMerge utilizes model merging techniques to achieve weight crossover and uses fine - tuning as weight mutation, aiming to break through the limitations of traditional fine - tuning methods and achieve better performance by simulating the process of natural selection. Specifically, the methodology of EvoMerge includes the following six key steps: 1. **Initialization**: Create the first batch of candidate models. A high - quality initial population can lead to better results or at least accelerate development in the initial stage. 2. **Evaluation**: Use a series of evaluation methods to determine the fitness of each model, that is, how good or bad each model is. The design of evaluation methods needs to be careful to avoid over - fitting of the model to a specific evaluation function. 3. **Selection**: Plan the next - generation models according to the fitness scores of the current - generation models. Theoretically, better - performing models should be selected for reproduction, but in order to maintain diversity and considering the inaccuracy of evaluation methods, a certain degree of randomness should be added to the selection process. 4. **Crossover**: Select pairs/groups selected in the previous step for combination to generate the next - generation models. For the evolution of model weights, popular model merging methods such as Spherical Linear Interpolation (SLERP), TIES, and DARE can be used. 5. **Mutation**: Fine - tune the newly reproduced models to introduce changes in model weights. Mutation is a crucial step to ensure that we do not just get the average of the initial population weights. Each mutation gives the model an opportunity to surpass its "parent" models. 6. **Repeat**: The above process is executed cyclically until the predetermined stopping conditions are met. Through this series of steps, EvoMerge aims to explore a new way to continuously improve large - scale language models and overcome the limitations of traditional fine - tuning methods.