LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Dongheng Li,Yongchang Hao,Lili Mou

2024-09-19

Abstract:Large language models have become increasingly popular and demonstrated remarkable performance in various natural language processing (NLP) tasks. However, these models are typically computationally expensive and difficult to be deployed in resource-constrained environments. In this paper, we propose LLMR, a novel knowledge distillation (KD) method based on a reward function induced from large language models. We conducted experiments on multiple datasets in the dialogue generation and summarization tasks. Empirical results demonstrate that our LLMR approach consistently outperforms traditional KD methods in different tasks and datasets.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem this paper attempts to address is: Although large language models (LLMs) perform excellently in various natural language processing tasks, they are computationally expensive and resource-intensive, making it difficult to deploy them in resource-constrained environments. Therefore, the paper proposes a knowledge distillation method based on reinforcement learning and large language model-induced rewards (LLMR), aiming to efficiently transfer the knowledge of large language models to a smaller, lighter model to alleviate the exposure bias problem in traditional knowledge distillation methods. Specifically, the main contributions of the paper include: 1. **Proposing the LLMR method**: By inducing a reward function from the prediction probabilities of large language models and using this reward function for reinforcement learning, knowledge distillation is achieved. 2. **Alleviating exposure bias**: Traditional knowledge distillation methods rely on the prediction sequences of the teacher model during training, while relying on the prediction sequences of the student model during inference, leading to the exposure bias problem. The LLMR method allows the student model to explore autonomously during the training process, thereby mitigating this issue. 3. **Experimental validation**: Experiments were conducted on dialogue generation and text summarization tasks, and the results show that the LLMR method outperforms traditional knowledge distillation methods on multiple datasets. In summary, this paper aims to improve the performance of small models in resource-constrained environments through an innovative knowledge distillation method while maintaining effects comparable to large language models.

LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Direct Preference Knowledge Distillation for Large Language Models

MiniLLM: Knowledge Distillation of Large Language Models

Evolving Knowledge Distillation with Large Language Models and Active Learning

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

A Survey on Knowledge Distillation of Large Language Models

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

Supervised Knowledge Makes Large Language Models Better In-context Learners

BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

LLAVADI: What Matters For Multimodal Large Language Models Distillation

Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data

Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Pre-training Distillation for Large Language Models: A Design Space Exploration

Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Models

DistiLLM: Towards Streamlined Distillation for Large Language Models

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

DDK: Distilling Domain Knowledge for Efficient Large Language Models

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models