OptLLM: Optimal Assignment of Queries to Large Language Models

Yueyue Liu,Hongyu Zhang,Yuantian Miao,Van-Hoang Le,Zhiqiang Li
2024-05-24
Abstract:Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressing the cost-effective query allocation problem for LLMs. Given a set of input queries and candidate LLMs, our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences, including options for maximizing accuracy and minimizing cost. OptLLM predicts the performance of candidate LLMs on each query using a multi-label classification model with uncertainty estimation and then iteratively generates a set of non-dominated solutions by destructing and reconstructing the current solution. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing. Our experimental results demonstrate that OptLLM substantially reduces costs by 2.40% to 49.18% while achieving the same accuracy as the best LLM. Compared to other multi-objective optimization algorithms, OptLLM improves accuracy by 2.94% to 69.05% at the same cost or saves costs by 8.79% and 95.87% while maintaining the highest attainable accuracy.
Software Engineering,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The main problem addressed in this paper is how to effectively allocate queries among multiple large language models (LLMs) to balance cost and performance. As LLMs become more prevalent, users need to make choices between models with different performance and prices. The paper proposes a framework called OptLLM, which uses multi-objective optimization methods to predict the performance of LLMs for each query and generates a set of non-dominated solutions, allowing users to select based on budget and performance preferences. OptLLM predicts performance using a multi-label classification model and takes uncertainty into account, and then iteratively generates non-dominated solutions. Experimental results show that OptLLM can maintain comparable accuracy to the best LLM while reducing costs, and in some cases, even improve accuracy or further reduce costs.