Abstract:This study is part of the debate on the efficiency of large versus small language models for text classification by prompting.We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models.Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring functions. Our findings reveal that small models can effectively classify texts, getting on par with or surpassing their larger counterparts.We developed and shared a comprehensive open-source repository that encapsulates our methodologies. This research underscores the notion that bigger isn't always better, suggesting that resource-efficient small models may offer viable solutions for specific data classification challenges.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **Do we need to use large - language models (LLMs) to solve text classification problems through the prompting method, or can small - language models achieve similar or even better results?** ### Specific problem decomposition: 1. **Impact of model scale**: The paper explores the relationship between the number of model parameters and zero - shot classification performance. Does a larger model necessarily lead to better classification results? 2. **Impact of architecture selection**: What is the impact of different model architectures (such as encoder - decoder vs. decoder - only) on zero - shot classification performance? 3. **Role of fine - tuning strategies**: Can instruction fine - tuning significantly improve the performance of small models? Does its effect depend on the specific model architecture or dataset? 4. **Choice of scoring functions**: Does the choice of different scoring functions have a significant impact on model performance? ### Main objectives of the paper: By comparing the zero - shot classification performance of language models of different scales, architectures, and fine - tuning strategies on multiple datasets, evaluate the potential of small - language models in this task and challenge the current mainstream view of "the bigger, the better". --- ### Summary of conclusions: 1. **Model scale is not a decisive factor**: On many datasets, there is no significant correlation between model size and performance. Some datasets (such as `cdr`) show a positive correlation, while other datasets (such as `ethos` and `imdb`) show a negative correlation. 2. **Architecture selection is crucial**: For some datasets (such as `agnews`, `bbcnews`, `sms`, etc.), the model architecture has a significant impact on performance. For example, the encoder - decoder architecture may be more suitable for specific tasks. 3. **The effect of instruction fine - tuning varies by dataset**: Instruction fine - tuning significantly improves performance on some datasets (such as `agnews`, `ethos`, `imdb`, etc.), but has an insignificant or even slightly negative effect on other datasets (such as `bbcnews`, `youtube`, `sms`). 4. **Scoring functions have limited impact**: Regardless of the model architecture, the choice of scoring functions has no significant impact on performance. --- ### Significance in scientific research: This paper provides a new perspective for text classification tasks in resource - constrained scenarios, indicating that small - language models can be an effective alternative to large models in some cases. This not only helps to reduce computing costs but also provides more flexibility for practical applications. At the same time, the paper reveals the importance of model architecture and fine - tuning strategies, providing a direction for further optimizing zero - shot classification tasks. If a more detailed interpretation or formula derivation is required, please further clarify!

Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Small Language Models: Survey, Measurements, and Insights

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Open, Closed, or Small Language Models for Text Classification?

Automatic Text Classification With Large Language Models: A Review of openai for Zero- and Few-Shot Classification

Large Language Models as Zero-Shot Conversational Recommenders

Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level

Large Language Models Are Zero-Shot Text Classifiers

Large Language Models are Strong Zero-Shot Retriever

What is the Role of Small Models in the LLM Era: A Survey

Small Language Models for Application Interactions: A Case Study

A Survey of Small Language Models

Language Models for Text Classification: Is In-Context Learning Enough?

Evaluation of medium-large Language Models at zero-shot closed book generative question answering

Mini-Giants: "Small" Language Models and Open Source Win-Win

Small Language Models Improve Giants by Rewriting Their Outputs

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

Large Language Models are Zero-Shot Reasoners