Abstract:Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to Automix are two key technical contributions. First, it has a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring extensive training. Second, given that self-verification can be noisy, it employs a POMDP based router that can effectively select an appropriately sized model, based on answer confidence. Experiments across five language models and five challenging datasets show that Automix consistently surpasses strong baselines, reducing computational cost by over 50% for comparable performance.

What problem does this paper attempt to address?

This paper proposes a method called AutoMix, which aims to address the problem of effectively utilizing large-scale language models (LLMs) with different scales and configurations. With the diversification of LLMs, selecting the optimal model to balance computation cost and performance becomes a challenge. AutoMix achieves this goal by strategically routing queries to larger-scale models based on the accuracy estimation of outputs from smaller-scale models. AutoMix has two key technical contributions: first, it has a self-validation mechanism with a small number of examples to evaluate the reliability of its outputs without the need for extensive training; second, it adopts a router based on Partially Observable Markov Decision Process (POMDP) to effectively select the appropriate scale model based on the confidence of the answers. Experimental results show that AutoMix surpasses strong baselines on five different language models and datasets, reducing computational cost by more than 50% while maintaining comparable performance. The challenges mentioned in the paper include determining the optimal model configuration, handling complex and variable tasks, and constraints of black-box APIs. AutoMix addresses these issues through self-validation and POMDP router, enabling learning with limited data and adapting to different scenarios with varying numbers, costs, and capabilities of models. Compared to existing model switching strategies, AutoMix does not require additional model routing training and can learn from a small amount of data. It improves efficiency and reduces resource waste by combining self-validation with model size selection, especially in handling complex tasks. In conclusion, AutoMix is an automatic framework for blending language models of different scales. It optimizes the trade-off between cost and performance through self-validation and intelligent routing strategies, providing users with a more efficient way to use language models.

AutoMix: Automatically Mixing Language Models

Code-mixed LLM: Improve Large Language Models' Capability to Handle Code-Mixing through Reinforcement Learning from AI Feedback

BiMix: Bivariate Data Mixing Law for Language Model Pretraining

Mixture-of-Agents Enhances Large Language Model Capabilities

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning

Aioli: A Unified Optimization Framework for Language Model Data Mixing

RegMix: Data Mixture as Regression for Language Model Pre-training

Harnessing Hard Mixed Samples with Decoupled Regularizer

Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral

Efficient Online Data Mixing For Language Model Pre-Training

AutoMix: Unveiling the Power of Mixup for Stronger Classifiers

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Large Language Models Synergize with Automated Machine Learning

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

No Need to Talk: Asynchronous Mixture of Language Models

Large Language Model Confidence Estimation via Black-Box Access

Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost