Abstract:Extensive fine-tuning on Large Language Models does not always yield better results. Oftentimes, models tend to get better at imitating one form of data without gaining greater reasoning ability and may even end up losing some intelligence. Here I introduce EvoMerge, a systematic approach to large language model training and merging. Leveraging model merging for weight crossover and fine-tuning for weight mutation, EvoMerge establishes an evolutionary process aimed at pushing models beyond the limits of conventional fine-tuning.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Large - scale language models (LLMs) do not always achieve better performance through extensive fine - tuning. Sometimes, the model may become better at imitating a certain type of data without enhancing its reasoning ability, and may even lose some of its original intelligence. For this reason, the author proposes the EvoMerge method, which is a systematic method for training and merging large - scale language models based on neuroevolution. EvoMerge utilizes model merging techniques to achieve weight crossover and uses fine - tuning as weight mutation, aiming to break through the limitations of traditional fine - tuning methods and achieve better performance by simulating the process of natural selection. Specifically, the methodology of EvoMerge includes the following six key steps: 1. **Initialization**: Create the first batch of candidate models. A high - quality initial population can lead to better results or at least accelerate development in the initial stage. 2. **Evaluation**: Use a series of evaluation methods to determine the fitness of each model, that is, how good or bad each model is. The design of evaluation methods needs to be careful to avoid over - fitting of the model to a specific evaluation function. 3. **Selection**: Plan the next - generation models according to the fitness scores of the current - generation models. Theoretically, better - performing models should be selected for reproduction, but in order to maintain diversity and considering the inaccuracy of evaluation methods, a certain degree of randomness should be added to the selection process. 4. **Crossover**: Select pairs/groups selected in the previous step for combination to generate the next - generation models. For the evolution of model weights, popular model merging methods such as Spherical Linear Interpolation (SLERP), TIES, and DARE can be used. 5. **Mutation**: Fine - tune the newly reproduced models to introduce changes in model weights. Mutation is a crucial step to ensure that we do not just get the average of the initial population weights. Each mutation gives the model an opportunity to surpass its "parent" models. 6. **Repeat**: The above process is executed cyclically until the predetermined stopping conditions are met. Through this series of steps, EvoMerge aims to explore a new way to continuously improve large - scale language models and overcome the limitations of traditional fine - tuning methods.

EvoMerge: Neuroevolution for Large Language Models

Large Language Models As Evolution Strategies

Knowledge Fusion By Evolving Weights of Language Models

Exploring the Improvement of Evolutionary Computation via Large Language Models

Large language models design sequence-defined macromolecules via evolutionary optimization

Arcee's MergeKit: A Toolkit for Merging Large Language Models

InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

Evolutionary Optimization of Model Merging Recipes

It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Multilevel Large Language Models for Everyone

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

When large language models meet evolutionary algorithms

Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism

Large language models help computer programs to evolve

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models

Evolving Subnetwork Training for Large Language Models

Bias Amplification in Language Model Evolution: An Iterated Learning Perspective