MeGA: Merging Multiple Independently Trained Neural Networks Based on Genetic Algorithm

Daniel Yun
2024-06-28
Abstract:In this paper, we introduce a novel method for merging the weights of multiple pre-trained neural networks using a genetic algorithm called MeGA. Traditional techniques, such as weight averaging and ensemble methods, often fail to fully harness the capabilities of pre-trained networks. Our approach leverages a genetic algorithm with tournament selection, crossover, and mutation to optimize weight combinations, creating a more effective fusion. This technique allows the merged model to inherit advantageous features from both parent models, resulting in enhanced accuracy and robustness. Through experiments on the CIFAR-10 dataset, we demonstrate that our genetic algorithm-based weight merging method improves test accuracy compared to individual models and conventional methods. This approach provides a scalable solution for integrating multiple pre-trained networks across various deep learning applications. Github is available at: <a class="link-external link-https" href="https://github.com/YUNBLAK/MeGA-Merging-Multiple-Independently-Trained-Neural-Networks-Based-on-Genetic-Algorithm" rel="external noopener nofollow">this https URL</a>
Neural and Evolutionary Computing,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is how to effectively merge the weights of multiple pre-trained neural networks to fully leverage the collective advantages of these models and improve overall performance. Traditional weight averaging and ensemble methods often fail to fully exploit the capabilities of pre-trained networks. The paper proposes a method based on genetic algorithms (referred to as MeGA), which optimizes the weight combination through selection, crossover, and mutation operations, thereby creating a more effective fusion model. This approach allows the merged model to inherit advantageous features from each parent model, enhancing accuracy and robustness. Experimental results show that, compared to individual models and traditional methods, this approach improves test accuracy on the CIFAR-10 dataset and demonstrates a scalable solution suitable for multi-model integration in various deep learning applications. Additionally, the method also shows the ability to merge weights of neural networks with different initializations and independent training, further proving its flexibility and robustness.