Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher

Hyunjong Ok,Jegwang Ryu,Jaeho Lee

2024-10-03

Abstract:How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality? This question has been well studied in scenarios where there is no restriction on the number of LLM supervisions one can use, giving birth to many decoding algorithms that utilize supervision without further training. However, it is still unclear what is an effective strategy under the $\textit{limited supervision}$ scenario, where we assume that no more than a few tokens can be generated by LLMs. To this end, we develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens so that the generated tokens can more accurately condition the subsequent token generation by small-scale LLM only. Critically, we find that it is essential to adaptively overtrust or disregard the LLM prediction based on the confidence of the small-scale LLM. Through our experiments on a wide range of models and datasets, we demonstrate that our method provides a consistent improvement over conventional decoding strategies. $\small$ $\textbf{Code:}$ <a class="link-external link-https" href="https://github.com/HJ-Ok/DecLimSup" rel="external noopener nofollow">this https URL</a>

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively use small - scale large language models (sLLM) to generate high - quality texts under the limited supervision of large language models (LLM). Specifically, the researchers focus on how to improve the quality of generated texts by combining the predictions of sLLM and LLM when there is only a small amount of LLM supervision. The paper mentions that under such constraints, the traditional strategy of over - trusting LLM is not always optimal, so new algorithms need to be developed to dynamically decide when to trust the teacher model (LLM) or the student model (sLLM), and to what extent. The main contributions of the paper include: 1. **Defining the sLLM decoding problem under limited supervision**, which is a research direction of practical importance. 2. **Discovering that under limited supervision conditions, the traditional over - trusting LLM strategy is often not optimal**. 3. **Proposing an entropy - based mechanism** for determining which side between sLLM and LLM should be over - trusted and to what extent, and verifying its effectiveness in multiple settings. Through these contributions, the paper provides new ideas and technical solutions for how to efficiently use large language models in resource - constrained environments.

Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher

Learning to Decode for Future Success

Learning to Decode Collaboratively with Multiple Language Models

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Graph-Structured Speculative Decoding

Speculative Contrastive Decoding

A Thorough Examination of Decoding Methods in the Era of LLMs

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

LLM can Achieve Self-Regulation via Hyperparameter Aware Generation

CLLMs: Consistency Large Language Models

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

Online Speculative Decoding

Mixture of Attentions For Speculative Decoding