Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models

Surya Narayanan Hari,Matt Thomson

2023-08-24

Abstract:The introduction of the transformer architecture and the self-attention mechanism has led to an explosive production of language models trained on specific downstream tasks and data domains. With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted workflows and data domains while addressing computational, security, and recency concerns. There is an urgent need for machine learning frameworks that can eliminate the burden of model selection and customization and unleash the incredible power of the vast emerging model library for end users. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library based on analysis of individual input prompts. Inspired by the thalamic router in the brain, Tryage employs a perceptive router to predict down-stream model performance on prompts and, then, makes a routing decision using an objective function that integrates performance predictions with user goals and constraints that are incorporated through flags (e.g., model size, model recency). Tryage allows users to explore a Pareto front and automatically trade-off between task accuracy and secondary goals including minimization of model size, recency, security, verbosity, and readability. Across heterogeneous data sets that include code, text, clinical data, and patents, the Tryage framework surpasses Gorilla and GPT3.5 turbo in dynamic model selection identifying the optimal model with an accuracy of 50.9% , compared to 23.6% by GPT 3.5 Turbo and 10.8% by Gorilla. Conceptually, Tryage demonstrates how routing models can be applied to program and control the behavior of multi-model LLM systems to maximize efficient use of the expanding and evolving language model ecosystem.

Machine Learning,Artificial Intelligence,Computation and Language,Multiagent Systems

What problem does this paper attempt to address?

The paper aims to address the challenges users face in selecting and optimizing models for specific workflows and data domains within the current large language model (LLM) ecosystem. Specifically, the paper proposes a perceptive routing system called Tryage, which can automatically select the best expert model by dynamically analyzing user input prompts. At the core of the Tryage system is a language model router that predicts the performance of downstream models based on input prompts and makes routing decisions through an objective function, taking into account user constraints (such as model size, model recency, etc.). The main issues addressed include: 1. **Model Selection Dilemma**: With over 200,000 models in the Hugging Face ecosystem, users find it challenging to choose the model that best suits their needs. 2. **Personalized Needs Fulfillment**: Users may want to balance task accuracy with other goals (such as model size, security, readability, etc.). 3. **Multi-Domain Data Processing**: Different domain datasets (such as code, text, clinical data, etc.) require different model handling approaches. 4. **Real-Time Dynamic Routing**: The ability to dynamically select the optimal model in real-world applications to adapt to ever-changing datasets and user needs. By introducing the Tryage system, the paper demonstrates how to leverage routing models to control the efficient operation of multi-model LLM systems, achieving higher performance than existing methods.

Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Green Runner: A tool for efficient deep learning component selection

Performance Characterization of Expert Router for Scalable LLM Inference

MoDEM: Mixture of Domain Expert Models

Exploring Domain Robust Lightweight Reward Models based on Router Mechanism

Routoo: Learning to Route to Large Language Models Effectively

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection

Elsevier Arena: Human Evaluation of Chemistry/Biology/Health Foundational Large Language Models

Glider: Global and Local Instruction-Driven Expert Router

Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models

RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models

Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes

Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model