SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

Kaushal Kumar Maurya,KV Aditya Srivatsa,Ekaterina Kochmar
2024-08-16
Abstract:Large language models (LLMs) have gained increased popularity due to their remarkable success across various tasks, which has led to the active development of a large set of diverse LLMs. However, individual LLMs have limitations when applied to complex tasks because of such factors as training biases, model sizes, and the datasets used. A promising approach is to efficiently harness the diverse capabilities of LLMs to overcome these individual limitations. Towards this goal, we introduce a novel LLM selection algorithm called SelectLLM. This algorithm directs input queries to the most suitable subset of LLMs from a large pool, ensuring they collectively provide the correct response efficiently. SelectLLM uses a multi-label classifier, utilizing the classifier's predictions and confidence scores to design optimal policies for selecting an optimal, query-aware, and lightweight subset of LLMs. Our findings show that the proposed model outperforms individual LLMs and achieves competitive performance compared to similarly sized, computationally expensive top-performing LLM subsets. Specifically, with a similarly sized top-performing LLM subset, we achieve a significant reduction in latency on two standard reasoning benchmarks: 13% lower latency for GSM8K and 70% lower latency for MMLU. Additionally, we conduct comprehensive analyses and ablation studies, which validate the robustness of the proposed model.
Computation and Language
What problem does this paper attempt to address?
The paper primarily addresses the limitations of large language models (LLMs) in handling complex tasks and proposes a new solution—the SELECT LLM algorithm. The core aim of the paper is to improve the performance of language models in handling complex tasks and to reduce computational costs through an effective model selection strategy. ### Research Background and Problem - Large language models (LLMs) perform excellently in various natural language processing tasks, but they have limitations in handling complex tasks such as factual reasoning and planning. - A single LLM may perform poorly on these complex tasks due to factors like training bias, model size, or dataset. - Existing LLM ensembles exhibit different capabilities, but no single model performs best across all benchmarks. - Previous research has attempted to leverage the strengths of different LLMs through ensemble methods, but this often requires accessing all responses in the model pool, thereby increasing computational costs. ### Solution - **SELECT LLM Algorithm**: This is a novel selection algorithm designed to efficiently choose the most suitable set of models from a large number of LLMs for a specific input query. - The algorithm uses a multi-label classifier to predict the suitability of each LLM for a given task and designs an optimal model selection strategy based on the predicted confidence scores. - Experimental results show that compared to a single LLM, the SELECT LLM algorithm not only improves accuracy but also significantly reduces latency costs. ### Main Contributions 1. **Algorithm Innovation**: The SELECT LLM algorithm is proposed, which can select the most appropriate set of LLMs based on the input query to improve response quality and reduce computational costs. 2. **Performance Improvement**: Experiments demonstrate that the algorithm achieves higher accuracy on two standard reasoning benchmarks compared to a single LLM, and significantly reduces latency while maintaining similar performance. 3. **Reliability Verification**: Extensive experimental analysis and ablation studies prove that the proposed model is reliable, robust, and cost-efficient. In summary, this paper aims to address the limitations of existing LLMs in handling complex tasks and proposes a novel method to optimize the model selection process, thereby improving overall performance and reducing costs.