Abstract:Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignoring the valuable information in the internal representations. In this work, we propose a training-free ensemble framework DeePEn, fusing the informative probability distributions yielded by different LLMs at each decoding step. Unfortunately, the vocabulary discrepancy between heterogeneous LLMs directly makes averaging the distributions unfeasible due to the token misalignment. To address this challenge, DeePEn maps the probability distribution of each model from its own probability space to a universal relative space based on the relative representation theory, and performs aggregation. Next, we devise a search-based inverse transformation to transform the aggregated result back to the probability space of one of the ensembling LLMs (main model), in order to determine the next token. We conduct extensive experiments on ensembles of different number of LLMs, ensembles of LLMs with different architectures, and ensembles between the LLM and the specialist model. Experimental results show that (i) DeePEn achieves consistent improvements across six benchmarks covering subject examination, reasoning, and knowledge, (ii) a well-performing specialist model can benefit from a less effective LLM through distribution fusion, and (iii) DeePEn has complementary strengths with other ensemble methods such as voting.

Diversity-Aware Ensembling of Language Models Based on Topological Data Analysis

Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling

Ensembles of Locally Independent Prediction Models

LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity

Enhancing Ensemble Clustering with Adaptive High-Order Topological Weights

Neural Network Ensembles: Theory, Training, and the Importance of Explicit Diversity

Efficient Diversity-Driven Ensemble for Deep Neural Networks

Structural Diversity for Decision Tree Ensemble Learning

Diversity Learning: Introducing the Space-time Scheme to Ensemble Learning

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Understanding the Role of Functional Diversity in Weight-Ensembling with Ingredient Selection and Multidimensional Scaling

Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration.

Ensemble Learning through Diversity Management: Theory, Algorithms, and Applications

Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness

Exploring Model Learning Heterogeneity for Boosting Ensemble Robustness

Transductive Ensemble Learning for Neural Machine Translation.

A Unified Theory of Diversity in Ensemble Learning

Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

Assembling ensembling: An adventure in approaches across disciplines

Derivative Free Weight-space Ensembling

Transfer learning for ensembles: reducing computation time and keeping the diversity