LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Minghao Wu,Abdul Waheed,Chiyu Zhang,Muhammad Abdul-Mageed,Alham Fikri Aji

2024-01-29

Abstract:Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities. However, these models are resource-intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs into much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizable, we design our instructions to cover a broad set of topics to ensure diversity. Extensive analysis of our instruction dataset confirms its diversity, and we generate responses for these instructions using gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families, with varying sizes. We evaluate the performance of our models using automatic metrics on 15 different natural language processing (NLP) benchmarks, as well as through human assessment. The results demonstrate that our proposed LaMini-LM models are comparable to competitive baselines, while being much smaller in size.

Computation and Language

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the issues of large - language models (LLMs) in terms of resource consumption and environmental impact. Although these large models have demonstrated excellent generation capabilities through instruction fine - tuning, they require a large amount of computational resources during the training and inference processes. This not only limits their applications in resource - constrained environments but also poses challenges in terms of energy consumption and environmental impact. Therefore, the authors explored a method, that is, distilling knowledge from large - language models into smaller - scale models to achieve a balance between performance and resource consumption. Specifically, they developed a large - scale dataset containing 2.58 million instructions and used this dataset to fine - tune a series of models with different architectures and sizes, and finally obtained a series of models named LaMini - LM. These models significantly reduce the number of parameters while maintaining high performance, thereby reducing resource requirements and making these models easier to deploy and use in resource - constrained environments. In addition, the author also evaluated the performance of these models on a variety of natural - language - processing tasks and tested the hallucinations and toxic content they generate to ensure the safety and reliability of the models.

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

MiniLLM: Knowledge Distillation of Large Language Models

Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

Align^2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

Instruction Mining: Instruction Data Selection for Tuning Large Language Models

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies