Abstract:The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. This has made the use of state-of-the-art LLMs more expensive for end-users. AI service providers, such as OpenAI and Anthropic, often offer multiple versions of LLMs with varying prices and performance. However, end-users still face challenges in choosing the appropriate LLM for their tasks that balance result quality with cost.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to reduce the inference cost of large - language models (LLMs) while ensuring the quality of the results of natural - language - processing tasks. Specifically, as the number of parameters in LLMs increases, although the performance has been significantly improved, the cost of deployment and use has also increased substantially, which is a huge burden for end - users. Currently, AI service providers usually offer multiple versions of LLMs with different prices and performances, but users face challenges when choosing a model suitable for their tasks, especially in the absence of real - world labels as references. To solve this problem, the paper proposes a new framework named SMART, which aims to minimize the inference cost of NLP tasks by automatically adjusting the model scale while ensuring that the quality of the results meets the user's accuracy requirements. SMART allows users to specify an accuracy constraint, that is, the probability that the output is equivalent to the output of the most powerful LLM does not exceed the user - defined threshold. SMART identifies models that meet the user - defined accuracy level by evaluating the performance of multiple LLMs and optimizes the trade - off between evaluation overhead and expected cost savings. In addition, SMART significantly reduces the inference cost by strategically using combinations of LLMs with different performances and costs. Overall, the goal of SMART is to significantly reduce the cost for users while providing output quality similar to that of the most powerful LLMs. Verified by experiments, the performance of SMART on three real - world datasets shows that, compared with GPT - 4, SMART can achieve cost savings of up to 25.6 times.

SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs

Achieving Peak Performance for Large Language Models: A Systematic Review

Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance

Computational Bottlenecks of Training Small-scale Large Language Models

The economic trade-offs of large language models: A case study

Towards Efficient Large Language Models for Scientific Text: A Review

On Speeding Up Language Model Evaluation

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

Double Jeopardy and Climate Impact in the Use of Large Language Models: Socio-economic Disparities and Reduced Utility for Non-English Speakers

Large Language Models are legal but they are not: Making the case for a powerful LegalLLM

OptLLM: Optimal Assignment of Queries to Large Language Models

Towards Optimizing the Costs of LLM Usage

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach