LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Minghao Wu,Abdul Waheed,Chiyu Zhang,Muhammad Abdul-Mageed,Alham Fikri Aji
2024-01-29
Abstract:Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities. However, these models are resource-intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs into much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizable, we design our instructions to cover a broad set of topics to ensure diversity. Extensive analysis of our instruction dataset confirms its diversity, and we generate responses for these instructions using gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families, with varying sizes. We evaluate the performance of our models using automatic metrics on 15 different natural language processing (NLP) benchmarks, as well as through human assessment. The results demonstrate that our proposed LaMini-LM models are comparable to competitive baselines, while being much smaller in size.
Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the issues of large - language models (LLMs) in terms of resource consumption and environmental impact. Although these large models have demonstrated excellent generation capabilities through instruction fine - tuning, they require a large amount of computational resources during the training and inference processes. This not only limits their applications in resource - constrained environments but also poses challenges in terms of energy consumption and environmental impact. Therefore, the authors explored a method, that is, distilling knowledge from large - language models into smaller - scale models to achieve a balance between performance and resource consumption. Specifically, they developed a large - scale dataset containing 2.58 million instructions and used this dataset to fine - tune a series of models with different architectures and sizes, and finally obtained a series of models named LaMini - LM. These models significantly reduce the number of parameters while maintaining high performance, thereby reducing resource requirements and making these models easier to deploy and use in resource - constrained environments. In addition, the author also evaluated the performance of these models on a variety of natural - language - processing tasks and tested the hallucinations and toxic content they generate to ensure the safety and reliability of the models.