Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

Minh Nguyen,Andrew Baker,Clement Neo,Allen Roush,Andreas Kirsch,Ravid Shwartz-Ziv
2024-10-13
Abstract:Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step. However, popular sampling methods like top-p (nucleus sampling) often struggle to balance quality and diversity, especially at higher temperatures, leading to incoherent or repetitive outputs. To address this challenge, we propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability. We conduct extensive experiments on benchmarks including GPQA, GSM8K, and AlpacaEval Creative Writing, demonstrating that min-p sampling improves both the quality and diversity of generated text, particularly at high temperatures. Moreover, human evaluations reveal a clear preference for min-p sampling in terms of both text quality and diversity. Min-p sampling has been adopted by multiple open-source LLM implementations, highlighting its practical utility and potential impact.
Computation and Language
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the challenge of how large language models (LLMs) balance creativity and coherence when generating text. Specifically, existing sampling methods such as top - p (nucleus sampling) often struggle to maintain both high quality and diversity at higher temperatures, resulting in generated text that may be incoherent or repetitive. To solve this problem, the authors propose a new sampling method - min - p sampling, which balances creativity and coherence by dynamically adjusting the sampling threshold, especially in high - temperature settings. ### Main contributions 1. **Introduction of min - p sampling**: A new dynamic truncation method that can dynamically adjust the sampling threshold according to the model's confidence, thereby maintaining the quality and diversity of the generated text even at high temperatures. 2. **Experimental verification**: Through extensive experiments on multiple benchmark datasets, it is proven that min - p sampling is superior to existing sampling methods such as top - p sampling and top - k sampling in terms of the quality and diversity of the generated text. 3. **Human evaluation**: A comprehensive human evaluation was carried out, and the results show that participants prefer the output of min - p sampling and think it performs better in terms of quality and diversity. 4. **Practical guide**: Provides practical experience - based guidelines for using min - p sampling, helping practitioners select appropriate hyperparameters and best practices. 5. **Community adoption**: Min - p sampling has been adopted by multiple open - source LLM implementations, further demonstrating its effectiveness and practicality. ### Method overview The core idea of min - p sampling is to dynamically adjust the sampling threshold according to the model's confidence at each decoding step. The specific steps are as follows: 1. **Calculate the maximum probability**: Determine the maximum probability token in the distribution \( p_{\text{max}}=\max_{v \in V} P(v|x_{1:t - 1}) \). 2. **Define the truncation threshold**: Set a base probability threshold \( p_{\text{base}}\in(0, 1] \) and scale it according to \( p_{\text{max}} \) to obtain the actual truncation threshold \( p_{\text{scaled}} = p_{\text{base}}\times p_{\text{max}} \). 3. **Define the sampling pool**: Construct a sampling pool \( V_{\text{min}} \), which contains tokens with probabilities greater than or equal to \( p_{\text{scaled}} \), \( V_{\text{min}}=\{v\in V:P(v|x_{1:t - 1})\geq p_{\text{scaled}}\} \). 4. **Sample from the pool**: Sample the next token \( x_t \) from \( V_{\text{min}} \) according to the normalized probability: \[ P'(v)=\frac{P(v|x_{1:t - 1})}{\sum_{v'\in V_{\text{min}}} P(v'|x_{1:t - 1})}\quad\text{for}\quad v\in V_{\text{min}} \] ### Experimental results The experimental results show that min - p sampling outperforms existing sampling methods on multiple benchmark datasets (such as GPQA, GSM8K, AlpacaEval creative writing). Especially in high - temperature settings, min - p sampling can better maintain the coherence and diversity of the generated text. In addition, human evaluation also confirms this, and participants generally believe that the output of min - p sampling is superior in terms of quality and diversity. ### Conclusion By introducing min - p sampling, this paper provides an effective solution that can balance creativity and coherence at high temperatures, thereby generating high - quality and diverse text. This method not only performs well in experiments but also has been widely adopted in practical applications, and has important theoretical and practical significance.