Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration.
Yichong Huang,Xiaocheng Feng,Baohang Li,Yang Xiang,Hui Wang,Bing Qin,Ting Liu
DOI: https://doi.org/10.48550/arxiv.2404.12715
2024-01-01
Abstract:Large language models (LLMs) exhibit complementary strengths in varioustasks, motivating the research of LLM ensembling. However, existing workfocuses on training an extra reward model or fusion model to select or combineall candidate answers, posing a great challenge to the generalization on unseendata distributions. Besides, prior methods use textual responses ascommunication media, ignoring the valuable information in the internalrepresentations. In this work, we propose a training-free ensemble frameworkDeePEn, fusing the informative probability distributions yielded by differentLLMs at each decoding step. Unfortunately, the vocabulary discrepancy betweenheterogeneous LLMs directly makes averaging the distributions unfeasible due tothe token misalignment. To address this challenge, DeePEn maps the probabilitydistribution of each model from its own probability space to a universalrelative space based on the relative representation theory, and performsaggregation. Next, we devise a search-based inverse transformation to transformthe aggregated result back to the probability space of one of the ensemblingLLMs (main model), in order to determine the next token. We conduct extensiveexperiments on ensembles of different number of LLMs, ensembles of LLMs withdifferent architectures, and ensembles between the LLM and the specialistmodel. Experimental results show that (i) DeePEn achieves consistentimprovements across six benchmarks covering subject examination, reasoning, andknowledge, (ii) a well-performing specialist model can benefit from a lesseffective LLM through distribution fusion, and (iii) DeePEn has complementarystrengths with other ensemble methods such as voting.