LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity

Selim Furkan Tekin,Fatih Ilhan,Tiansheng Huang,Sihao Hu,Ling Liu
2024-10-05
Abstract:Combining large language models during training or at inference time has shown substantial performance gain over component LLMs. This paper presents LLM-TOPLA, a diversity-optimized LLM ensemble method with three unique properties: (i) We introduce the focal diversity metric to capture the diversity-performance correlation among component LLMs of an ensemble. (ii) We develop a diversity-optimized ensemble pruning algorithm to select the top-k sub-ensembles from a pool of $N$ base LLMs. Our pruning method recommends top-performing LLM subensembles of size $S$, often much smaller than $N$. (iii) We generate new output for each prompt query by utilizing a learn-to-ensemble approach, which learns to detect and resolve the output inconsistency among all component LLMs of an ensemble. Extensive evaluation on four different benchmarks shows good performance gain over the best LLM ensemble methods: (i) In constrained solution set problems, LLM-TOPLA outperforms the best-performing ensemble (Mixtral) by 2.2\% in accuracy on MMLU and the best-performing LLM ensemble (MoreAgent) on GSM8k by 2.1\%. (ii) In generative tasks, LLM-TOPLA outperforms the top-2 performers (Llama70b/Mixtral) on SearchQA by $3.9\mathrm{x}$ in F1, and on XSum by more than $38$ in ROUGE-1. Our code and dataset, which contains outputs of 8 modern LLMs on 4 benchmarks is available at <a class="link-external link-https" href="https://github.com/git-disl/llm-topla" rel="external noopener nofollow">this https URL</a>
Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address two key issues in the integration of large language models (LLMs): 1. **How to select the best model combination from a large number of open-source or closed-source LLMs**: - Modern large language models have billions of parameters, vast training datasets, and perform well on many zero-shot and one-shot tasks. However, selecting the best model combination from numerous LLMs is a challenge. 2. **How to combine potentially conflicting outputs from multiple LLMs to achieve the best generative output for the target learning task**: - Multiple LLMs may produce different or even contradictory outputs. Effectively detecting and resolving these inconsistencies to generate high-quality final outputs is also an important issue. To address these problems, the paper proposes LLM-TOPLA, an optimized diversity LLM integration method with the following three unique features: 1. **Introducing a focus diversity metric**: - This metric is used to capture the correlation between diversity and performance among the component LLMs in the integration. 2. **Developing a diversity-optimized integration pruning algorithm**: - This algorithm selects the best top-k sub-integrations from a pool of N base LLMs. The recommended sub-integration size is usually much smaller than N but performs comparably or better. 3. **Utilizing a learning integration method to generate new outputs**: - This method learns to detect and resolve output inconsistencies among all component LLMs, generating LLM-TOPLA's output for each query. ### Experimental Results The paper demonstrates the significant performance improvement of LLM-TOPLA through extensive evaluations on four different benchmarks: 1. **In constrained solution set problems**: - LLM-TOPLA improves accuracy by 2.2% over the best integration method (Mixtral) on MMLU and by 2.1% over the best integration method (MoreAgent) on GSM8k. 2. **In generative tasks**: - LLM-TOPLA improves the F1 score by 3.9 times over the top performer (Llama70b/Mixtral) on SearchQA and improves the ROUGE-1 score by over 38% over the best method on XSum. ### Summary LLM-TOPLA effectively addresses the issues of selecting the best model combination from numerous LLMs and combining multiple LLM outputs by introducing a focus diversity metric, developing a diversity-optimized pruning algorithm, and utilizing a learning integration method, significantly enhancing the performance of integrated models.