What is the Role of Small Models in the LLM Era: A Survey

Lihu Chen,Gaƫl Varoquaux
2024-09-30
Abstract:Large Language Models (LLMs) have made significant progress in advancing artificial general intelligence (AGI), leading to the development of increasingly large models such as GPT-4 and LLaMA-405B. However, scaling up model sizes results in exponentially higher computational costs and energy consumption, making these models impractical for academic researchers and businesses with limited resources. At the same time, Small Models (SMs) are frequently used in practical settings, although their significance is currently underestimated. This raises important questions about the role of small models in the era of LLMs, a topic that has received limited attention in prior research. In this work, we systematically examine the relationship between LLMs and SMs from two key perspectives: Collaboration and Competition. We hope this survey provides valuable insights for practitioners, fostering a deeper understanding of the contribution of small models and promoting more efficient use of computational resources. The code is available at <a class="link-external link-https" href="https://github.com/tigerchen52/role_of_small_models" rel="external noopener nofollow">this https URL</a>
Computation and Language
What problem does this paper attempt to address?
This paper attempts to explore the role and value of small models (SMs) in the era of large language models (LLMs). Specifically, the paper systematically analyzes the relationship between LLMs and SMs from two key perspectives: cooperation and competition. 1. **Cooperation**: - **Data Curation**: Small models can be used for curating pre-training data and instruction tuning data, including data selection, data re-weighting, etc., to improve the performance of large models. - **Weak-to-Strong Paradigm**: Using weaker (smaller) models to supervise and align stronger (larger) models, enabling large models to surpass the limitations of their weaker supervisors. - **Efficient Inference**: Optimizing the inference process of large models through model ensemble (such as model cascading and model routing) and speculative decoding techniques, reducing costs and improving efficiency. - **Evaluating LLMs**: Using small models as proxy models to automatically evaluate the text generated by large models from multiple perspectives (such as factuality and fluency). - **Domain Adaptation**: Adapting large models to specific tasks or domains through small models, adjusting their performance in particular tasks or fields. 2. **Competition**: - The paper also discusses the unique advantages of small models in certain scenarios, such as simplicity, low cost, and higher interpretability, and analyzes the importance of these advantages in specific applications. In summary, this paper aims to provide valuable insights for practitioners, promote a deeper understanding of the contributions of small models, and drive more efficient use of computational resources.