Abstract:Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignoring the valuable information in the internal representations. In this work, we propose a training-free ensemble framework DeePEn, fusing the informative probability distributions yielded by different LLMs at each decoding step. Unfortunately, the vocabulary discrepancy between heterogeneous LLMs directly makes averaging the distributions unfeasible due to the token misalignment. To address this challenge, DeePEn maps the probability distribution of each model from its own probability space to a universal relative space based on the relative representation theory, and performs aggregation. Next, we devise a search-based inverse transformation to transform the aggregated result back to the probability space of one of the ensembling LLMs (main model), in order to determine the next token. We conduct extensive experiments on ensembles of different number of LLMs, ensembles of LLMs with different architectures, and ensembles between the LLM and the specialist model. Experimental results show that (i) DeePEn achieves consistent improvements across six benchmarks covering subject examination, reasoning, and knowledge, (ii) a well-performing specialist model can benefit from a less effective LLM through distribution fusion, and (iii) DeePEn has complementary strengths with other ensemble methods such as voting.

Efficient Learning Ensemble SuperParent-one-dependence Estimator by Maximizing Conditional Log Likelihood

Ensemble selection for superparent-one-dependence estimators

To Select or to Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators

Genetic Ensemble of Extreme Learning Machine

To select or to weigh: a comparative study of model selection and model weighing for SPODE ensembles

Instance-based weighting filter for superparent one-dependence estimators

Model Weighting for One-Dependence Estimators by Measuring the Independence Assumptions

Weighted One-Dependence Forests Classifier.

Superiority combination learning distributed particle swarm optimization for large-scale optimization

Semi-supervised Weighting for Averaged One-Dependence Estimators

On Optimizing Ensemble Models using Column Generation

Deep Super Learner: A Deep Ensemble for Classification Problems

Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification

Multiple surrogates and offspring-assisted differential evolution for high-dimensional expensive problems

Surrogate ensemble assisted large-scale expensive optimization with random grouping

SUOD: Toward Scalable Unsupervised Outlier Detection

Averaged Tree-Augmented One-Dependence Estimators

Cooperative coevolutionary surrogate ensemble-assisted differential evolution with efficient dual differential grouping for large-scale expensive optimization problems

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Predictive Ensemble Pruning by Expectation Propagation

Coupled Learning Enabled Stochastic Programming with Endogenous Uncertainty