Abstract:Large language models (LLMs) have gained increased popularity due to their remarkable success across various tasks, which has led to the active development of a large set of diverse LLMs. However, individual LLMs have limitations when applied to complex tasks because of such factors as training biases, model sizes, and the datasets used. A promising approach is to efficiently harness the diverse capabilities of LLMs to overcome these individual limitations. Towards this goal, we introduce a novel LLM selection algorithm called SelectLLM. This algorithm directs input queries to the most suitable subset of LLMs from a large pool, ensuring they collectively provide the correct response efficiently. SelectLLM uses a multi-label classifier, utilizing the classifier's predictions and confidence scores to design optimal policies for selecting an optimal, query-aware, and lightweight subset of LLMs. Our findings show that the proposed model outperforms individual LLMs and achieves competitive performance compared to similarly sized, computationally expensive top-performing LLM subsets. Specifically, with a similarly sized top-performing LLM subset, we achieve a significant reduction in latency on two standard reasoning benchmarks: 13% lower latency for GSM8K and 70% lower latency for MMLU. Additionally, we conduct comprehensive analyses and ablation studies, which validate the robustness of the proposed model.

What problem does this paper attempt to address?

The paper primarily addresses the limitations of large language models (LLMs) in handling complex tasks and proposes a new solution—the SELECT LLM algorithm. The core aim of the paper is to improve the performance of language models in handling complex tasks and to reduce computational costs through an effective model selection strategy. ### Research Background and Problem - Large language models (LLMs) perform excellently in various natural language processing tasks, but they have limitations in handling complex tasks such as factual reasoning and planning. - A single LLM may perform poorly on these complex tasks due to factors like training bias, model size, or dataset. - Existing LLM ensembles exhibit different capabilities, but no single model performs best across all benchmarks. - Previous research has attempted to leverage the strengths of different LLMs through ensemble methods, but this often requires accessing all responses in the model pool, thereby increasing computational costs. ### Solution - **SELECT LLM Algorithm**: This is a novel selection algorithm designed to efficiently choose the most suitable set of models from a large number of LLMs for a specific input query. - The algorithm uses a multi-label classifier to predict the suitability of each LLM for a given task and designs an optimal model selection strategy based on the predicted confidence scores. - Experimental results show that compared to a single LLM, the SELECT LLM algorithm not only improves accuracy but also significantly reduces latency costs. ### Main Contributions 1. **Algorithm Innovation**: The SELECT LLM algorithm is proposed, which can select the most appropriate set of LLMs based on the input query to improve response quality and reduce computational costs. 2. **Performance Improvement**: Experiments demonstrate that the algorithm achieves higher accuracy on two standard reasoning benchmarks compared to a single LLM, and significantly reduces latency while maintaining similar performance. 3. **Reliability Verification**: Extensive experimental analysis and ablation studies prove that the proposed model is reliable, robust, and cost-efficient. In summary, this paper aims to address the limitations of existing LLMs in handling complex tasks and proposes a novel method to optimize the model selection process, thereby improving overall performance and reducing costs.

SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

OptLLM: Optimal Assignment of Queries to Large Language Models

LLM-Select: Feature Selection with Large Language Models

Efficient Sequential Decision Making with Large Language Models

Large Language Model-Enhanced Algorithm Selection: Towards Comprehensive Algorithm Representation

PickLLM: Context-Aware RL-Assisted Large Language Model Routing

Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

LLM2: Let Large Language Models Harness System 2 Reasoning

Large Language Model-guided Document Selection

Large Language Models Are Not Robust Multiple Choice Selectors.

Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

A Survey on Efficient Inference for Large Language Models

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Scaling Laws for Discriminative Classification in Large Language Models

Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance