Abstract:In this paper, we introduce a novel approach for large language model merging via black-box multi-objective optimization algorithms. The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. However, model merging faces two significant challenges: First, existing methods rely heavily on human intuition and customized strategies to tackle multiple tasks. Second, it's difficult to search for the great model merging configuration in limited evaluations. To address these challenges, we propose a multi-objective optimization based model merging method named MM-MO. The proposed method can automatically search merging configurations for multiple tasks with multi-objective optimization algorithms. Moreover, to obtain high-quality model merging configurations within a limited number of evaluation iterations, we have made several improvements to multi-objective Bayesian optimization specifically for model merging scenarios. First, we introduced a weak-to-strong method to improve the acquisition strategy. Second, we employed Fisher information to select configurations, further increasing the chances of discovering superior model merging configurations. Third, we designed a sparsity metric as an additional optimization objective to enhance the model's generalization performance across different tasks. We conducted comprehensive experiments with other mainstream model merging methods, demonstrating that our method consistently outperforms them. Moreover, performance improvements are observed even on the tasks not explicitly targeted as optimization objectives, indicating that our method enhances the overall potential of the model. ...

Combined maximum entropy language model using different feature sets

Efficient representation and fast look-up of Maximum Entropy language models.

A maximum entropy approach to adaptive statistical language modelling

An Improved Maximum Entropy Language Model and Its Application

Trigger-based language models: a maximum entropy approach

Incorporating Linguistic Structure into Maximum Entropy Language Models

An Improved Maximum Entropy Language Model

Perplexity Measuring of Language Model and the Entropy Estimating of Chinese

Model and Simulation of Maximum Entropy Phrase Reordering of English Text in Language Learning Machine

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Toward Inference-optimal Mixture-of-Expert Large Language Models

A Fast Algorithm for Feature Selection in Conditional Maximum Entropy Modeling

Get Confused Cautiously: Textual Sequence Memorization Erasure with Selective Entropy Maximization

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Maximum Entropy based Rule Selection Model for Syntax-based Statistical Machine Translation.

Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models

Tuning Language Models by Mixture-of-Depths Ensemble

HMoE: Heterogeneous Mixture of Experts for Language Modeling

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

MH-MoE: Multi-Head Mixture-of-Experts

It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization