Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models

Surya Narayanan Hari,Matt Thomson
2023-08-24
Abstract:The introduction of the transformer architecture and the self-attention mechanism has led to an explosive production of language models trained on specific downstream tasks and data domains. With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted workflows and data domains while addressing computational, security, and recency concerns. There is an urgent need for machine learning frameworks that can eliminate the burden of model selection and customization and unleash the incredible power of the vast emerging model library for end users. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library based on analysis of individual input prompts. Inspired by the thalamic router in the brain, Tryage employs a perceptive router to predict down-stream model performance on prompts and, then, makes a routing decision using an objective function that integrates performance predictions with user goals and constraints that are incorporated through flags (e.g., model size, model recency). Tryage allows users to explore a Pareto front and automatically trade-off between task accuracy and secondary goals including minimization of model size, recency, security, verbosity, and readability. Across heterogeneous data sets that include code, text, clinical data, and patents, the Tryage framework surpasses Gorilla and GPT3.5 turbo in dynamic model selection identifying the optimal model with an accuracy of 50.9% , compared to 23.6% by GPT 3.5 Turbo and 10.8% by Gorilla. Conceptually, Tryage demonstrates how routing models can be applied to program and control the behavior of multi-model LLM systems to maximize efficient use of the expanding and evolving language model ecosystem.
Machine Learning,Artificial Intelligence,Computation and Language,Multiagent Systems
What problem does this paper attempt to address?
The paper aims to address the challenges users face in selecting and optimizing models for specific workflows and data domains within the current large language model (LLM) ecosystem. Specifically, the paper proposes a perceptive routing system called Tryage, which can automatically select the best expert model by dynamically analyzing user input prompts. At the core of the Tryage system is a language model router that predicts the performance of downstream models based on input prompts and makes routing decisions through an objective function, taking into account user constraints (such as model size, model recency, etc.). The main issues addressed include: 1. **Model Selection Dilemma**: With over 200,000 models in the Hugging Face ecosystem, users find it challenging to choose the model that best suits their needs. 2. **Personalized Needs Fulfillment**: Users may want to balance task accuracy with other goals (such as model size, security, readability, etc.). 3. **Multi-Domain Data Processing**: Different domain datasets (such as code, text, clinical data, etc.) require different model handling approaches. 4. **Real-Time Dynamic Routing**: The ability to dynamically select the optimal model in real-world applications to adapt to ever-changing datasets and user needs. By introducing the Tryage system, the paper demonstrates how to leverage routing models to control the efficient operation of multi-model LLM systems, achieving higher performance than existing methods.