Abstract:Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy -- - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to "tuning-induced mis-calibration". In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way. Specifically, we identify the "concentrated knowledge" phenomenon during distillation, which can significantly reduce the computational burden. Then we apply a "trustworthy maximization" process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less mis-calibration (-10%) are achieved on average across both in-domain and out-of-domain scenarios, indicating better trustworthiness.

What problem does this paper attempt to address?

The paper attempts to address the issue of "tuning-induced mis-calibration" that occurs during the fine-tuning process of large language models (LLMs), which refers to the inconsistency between the predicted confidence and the actual correctness after fine-tuning. This phenomenon results in the fine-tuned model exhibiting higher accuracy on downstream tasks, but its trustworthiness (including both accuracy and good calibration) remains unsatisfactory. Specifically, the paper points out that although current popular fine-tuning methods can significantly improve the model's performance on specific tasks, the reliability of these models in practical applications is limited due to "tuning-induced mis-calibration." Additionally, traditional knowledge distillation methods, while improving calibration to some extent, still have biases because the teacher model itself may have calibration issues. Therefore, how to improve the trustworthiness of small models while maintaining efficiency has become an urgent problem to solve. To address this issue, the paper proposes a new method called Efficient Trustworthy Distillation (FIRST). This method reduces computational burden by identifying and utilizing the phenomenon of "concentrated knowledge," where the probability distribution of generated tokens is concentrated on a few high-probability tokens. Then, through a process of "trustworthy maximization," it optimizes the use of this part of the knowledge to ensure that when the knowledge is transferred to the student model, its accuracy and calibration are maximized. In summary, the main goal of the paper is to obtain a reliable language model that is both accurate and well-calibrated by efficiently utilizing a portion of the teacher model's knowledge, thereby improving the model's trustworthiness in practical applications.

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

Self-Improving Teacher Cultivates Better Student: Distillation Calibration for Multimodal Large Language Models

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Multi-Granularity Semantic Revision for Large Language Model Distillation

Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios

TrustAL: Trustworthy Active Learning Using Knowledge Distillation

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

DistiLLM: Towards Streamlined Distillation for Large Language Models

Can a student Large Language Model perform as well as it's teacher?

Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Faithful Knowledge Distillation

LLAVADI: What Matters For Multimodal Large Language Models Distillation

Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Models

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Pre-training Distillation for Large Language Models: A Design Space Exploration

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application