What is the Role of Small Models in the LLM Era: A Survey

Lihu Chen,Gaël Varoquaux

2024-09-30

Abstract:Large Language Models (LLMs) have made significant progress in advancing artificial general intelligence (AGI), leading to the development of increasingly large models such as GPT-4 and LLaMA-405B. However, scaling up model sizes results in exponentially higher computational costs and energy consumption, making these models impractical for academic researchers and businesses with limited resources. At the same time, Small Models (SMs) are frequently used in practical settings, although their significance is currently underestimated. This raises important questions about the role of small models in the era of LLMs, a topic that has received limited attention in prior research. In this work, we systematically examine the relationship between LLMs and SMs from two key perspectives: Collaboration and Competition. We hope this survey provides valuable insights for practitioners, fostering a deeper understanding of the contribution of small models and promoting more efficient use of computational resources. The code is available at <a class="link-external link-https" href="https://github.com/tigerchen52/role_of_small_models" rel="external noopener nofollow">this https URL</a>

Computation and Language

What problem does this paper attempt to address?

This paper attempts to explore the role and value of small models (SMs) in the era of large language models (LLMs). Specifically, the paper systematically analyzes the relationship between LLMs and SMs from two key perspectives: cooperation and competition. 1. **Cooperation**: - **Data Curation**: Small models can be used for curating pre-training data and instruction tuning data, including data selection, data re-weighting, etc., to improve the performance of large models. - **Weak-to-Strong Paradigm**: Using weaker (smaller) models to supervise and align stronger (larger) models, enabling large models to surpass the limitations of their weaker supervisors. - **Efficient Inference**: Optimizing the inference process of large models through model ensemble (such as model cascading and model routing) and speculative decoding techniques, reducing costs and improving efficiency. - **Evaluating LLMs**: Using small models as proxy models to automatically evaluate the text generated by large models from multiple perspectives (such as factuality and fluency). - **Domain Adaptation**: Adapting large models to specific tasks or domains through small models, adjusting their performance in particular tasks or fields. 2. **Competition**: - The paper also discusses the unique advantages of small models in certain scenarios, such as simplicity, low cost, and higher interpretability, and analyzes the importance of these advantages in specific applications. In summary, this paper aims to provide valuable insights for practitioners, promote a deeper understanding of the contributions of small models, and drive more efficient use of computational resources.

What is the Role of Small Models in the LLM Era: A Survey

Small Language Models: Survey, Measurements, and Insights

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?

A Survey of Small Language Models

Large Language Models are legal but they are not: Making the case for a powerful LegalLLM

Large Language Models: A Survey

Computational Bottlenecks of Training Small-scale Large Language Models

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

A Survey of Large Language Models

Large Language Models on Small Resource-Constrained Systems: Performance Characterization, Analysis and Trade-offs

An Interdisciplinary Outlook on Large Language Models for Scientific Research

Large language models (LLMs): survey, technical frameworks, and future challenges

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Super Tiny Language Models

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

ChatGPT Alternative Solutions: Large Language Models Survey

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey