FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

KaShun Shum,Minrui Xu,Jianshu Zhang,Zixin Chen,Shizhe Diao,Hanze Dong,Jipeng Zhang,Muhammad Omer Raza
2024-10-03
Abstract:Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy -- - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to "tuning-induced mis-calibration". In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way. Specifically, we identify the "concentrated knowledge" phenomenon during distillation, which can significantly reduce the computational burden. Then we apply a "trustworthy maximization" process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less mis-calibration (-10%) are achieved on average across both in-domain and out-of-domain scenarios, indicating better trustworthiness.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of "tuning-induced mis-calibration" that occurs during the fine-tuning process of large language models (LLMs), which refers to the inconsistency between the predicted confidence and the actual correctness after fine-tuning. This phenomenon results in the fine-tuned model exhibiting higher accuracy on downstream tasks, but its trustworthiness (including both accuracy and good calibration) remains unsatisfactory. Specifically, the paper points out that although current popular fine-tuning methods can significantly improve the model's performance on specific tasks, the reliability of these models in practical applications is limited due to "tuning-induced mis-calibration." Additionally, traditional knowledge distillation methods, while improving calibration to some extent, still have biases because the teacher model itself may have calibration issues. Therefore, how to improve the trustworthiness of small models while maintaining efficiency has become an urgent problem to solve. To address this issue, the paper proposes a new method called Efficient Trustworthy Distillation (FIRST). This method reduces computational burden by identifying and utilizing the phenomenon of "concentrated knowledge," where the probability distribution of generated tokens is concentrated on a few high-probability tokens. Then, through a process of "trustworthy maximization," it optimizes the use of this part of the knowledge to ensure that when the knowledge is transferred to the student model, its accuracy and calibration are maximized. In summary, the main goal of the paper is to obtain a reliable language model that is both accurate and well-calibrated by efficiently utilizing a portion of the teacher model's knowledge, thereby improving the model's trustworthiness in practical applications.