AutoMix: Automatically Mixing Language Models

Pranjal Aggarwal,Aman Madaan,Ankit Anand,Srividya Pranavi Potharaju,Swaroop Mishra,Pei Zhou,Aditya Gupta,Dheeraj Rajagopal,Karthik Kappaganthu,Yiming Yang,Shyam Upadhyay,Manaal Faruqui,Mausam
2024-06-29
Abstract:Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to Automix are two key technical contributions. First, it has a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring extensive training. Second, given that self-verification can be noisy, it employs a POMDP based router that can effectively select an appropriately sized model, based on answer confidence. Experiments across five language models and five challenging datasets show that Automix consistently surpasses strong baselines, reducing computational cost by over 50% for comparable performance.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper proposes a method called AutoMix, which aims to address the problem of effectively utilizing large-scale language models (LLMs) with different scales and configurations. With the diversification of LLMs, selecting the optimal model to balance computation cost and performance becomes a challenge. AutoMix achieves this goal by strategically routing queries to larger-scale models based on the accuracy estimation of outputs from smaller-scale models. AutoMix has two key technical contributions: first, it has a self-validation mechanism with a small number of examples to evaluate the reliability of its outputs without the need for extensive training; second, it adopts a router based on Partially Observable Markov Decision Process (POMDP) to effectively select the appropriate scale model based on the confidence of the answers. Experimental results show that AutoMix surpasses strong baselines on five different language models and datasets, reducing computational cost by more than 50% while maintaining comparable performance. The challenges mentioned in the paper include determining the optimal model configuration, handling complex and variable tasks, and constraints of black-box APIs. AutoMix addresses these issues through self-validation and POMDP router, enabling learning with limited data and adapting to different scenarios with varying numbers, costs, and capabilities of models. Compared to existing model switching strategies, AutoMix does not require additional model routing training and can learn from a small amount of data. It improves efficiency and reduces resource waste by combining self-validation with model size selection, especially in handling complex tasks. In conclusion, AutoMix is an automatic framework for blending language models of different scales. It optimizes the trade-off between cost and performance through self-validation and intelligent routing strategies, providing users with a more efficient way to use language models.