LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Dongheng Li,Yongchang Hao,Lili Mou
2024-09-19
Abstract:Large language models have become increasingly popular and demonstrated remarkable performance in various natural language processing (NLP) tasks. However, these models are typically computationally expensive and difficult to be deployed in resource-constrained environments. In this paper, we propose LLMR, a novel knowledge distillation (KD) method based on a reward function induced from large language models. We conducted experiments on multiple datasets in the dialogue generation and summarization tasks. Empirical results demonstrate that our LLMR approach consistently outperforms traditional KD methods in different tasks and datasets.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is: Although large language models (LLMs) perform excellently in various natural language processing tasks, they are computationally expensive and resource-intensive, making it difficult to deploy them in resource-constrained environments. Therefore, the paper proposes a knowledge distillation method based on reinforcement learning and large language model-induced rewards (LLMR), aiming to efficiently transfer the knowledge of large language models to a smaller, lighter model to alleviate the exposure bias problem in traditional knowledge distillation methods. Specifically, the main contributions of the paper include: 1. **Proposing the LLMR method**: By inducing a reward function from the prediction probabilities of large language models and using this reward function for reinforcement learning, knowledge distillation is achieved. 2. **Alleviating exposure bias**: Traditional knowledge distillation methods rely on the prediction sequences of the teacher model during training, while relying on the prediction sequences of the student model during inference, leading to the exposure bias problem. The LLMR method allows the student model to explore autonomously during the training process, thereby mitigating this issue. 3. **Experimental validation**: Experiments were conducted on dialogue generation and text summarization tasks, and the results show that the LLMR method outperforms traditional knowledge distillation methods on multiple datasets. In summary, this paper aims to improve the performance of small models in resource-constrained environments through an innovative knowledge distillation method while maintaining effects comparable to large language models.