Small Language Models Improve Giants by Rewriting Their Outputs

Giorgos Vernikos,Arthur Bražinskas,Jakub Adamek,Jonathan Mallinson,Aliaksei Severyn,Eric Malmi
2024-02-01
Abstract:Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our approach directly targets LLM predictions without requiring access to their weights. We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output. Our experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning. Furthermore, we illustrate the robustness of LMCor against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we show that LMCor can be seamlessly integrated with different LLMs at inference, serving as a plug-and-play module to improve their performance.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily aims to address the performance improvement of large language models (LLMs) across different tasks, specifically: 1. **Enhancing performance using training data**: Although large language models perform well on various tasks, they often do not perform as well on specific tasks compared to smaller models fine-tuned for those tasks. This paper proposes a method to improve the performance of LLMs by leveraging existing training data without the need for fine-tuning. 2. **Reducing the need for prompt engineering**: Traditional few-shot learning methods often require extensive prompt engineering, which is time-consuming and does not necessarily guarantee performance improvement. The method proposed in this paper aims to reduce the reliance on complex prompt design by optimizing and merging candidate answers. 3. **No need to access model weights**: Unlike traditional fine-tuning methods, the proposed method directly operates on the outputs generated by LLMs without needing to access the model weights. This makes it suitable for commercial models that can only be accessed through restricted inference APIs. ### Main Contributions - Introduced LM-Corrector (LMC OR), a small model that can enhance the performance of LLMs by merging and correcting multiple candidate answers generated by LLMs without accessing their weights. - Conducted experiments on four natural language generation tasks, demonstrating that even a relatively small LMC OR model (250 million parameters) can significantly improve the performance of LLMs with 62 billion parameters, and in some cases, even surpass specialized fine-tuned models. - Showcased the robustness of LMC OR to different prompts, reducing the need for precise prompt design. - Demonstrated that LMC OR can be seamlessly integrated as a plug-and-play module into different LLMs, enhancing their generality and flexibility.