Abstract:Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. For example, a higher p threshold in the nucleus (top-p) sampling increases the diversity but decreases the factuality, and vice versa. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling, a decoding method that achieves improved factuality and diversity over nucleus sampling by predicting an adaptive threshold of $p$. Specifically, REAL sampling predicts the step-wise likelihood of an LLM to hallucinate, and lowers the p threshold when an LLM is likely to hallucinate. Otherwise, REAL sampling increases the p threshold to boost the diversity. To predict the step-wise hallucination likelihood without supervision, we construct a Token-level Hallucination Forecasting (THF) model to predict the asymptotic entropy (i.e., inherent uncertainty) of the next token by extrapolating the next-token entropies from a series of LLMs with different sizes. If a LLM's entropy is higher than the asymptotic entropy (i.e., the LLM is more uncertain than it should be), the THF model predicts a high hallucination hazard, which leads to a lower p threshold in REAL sampling. In the FactualityPrompts benchmark, we demonstrate that REAL sampling based on a 70M THF model can substantially improve the factuality and diversity of 7B LLMs simultaneously, judged by both retrieval-based metrics and human evaluation. After combined with contrastive decoding, REAL sampling outperforms 9 sampling methods, and generates texts that are more factual than the greedy sampling and more diverse than the nucleus sampling with $p=0.5$. Furthermore, the predicted asymptotic entropy is also a useful unsupervised signal for hallucination detection tasks.

Penalizing the High-likelihood: A Novel Sampling Method for Open-ended Neural Text Generation via Inverse Probability Weighting

Improve the Diversity and Novelty for Open-Ended Neural Text Generation via Inverse Probability Weighting.

Improving Diversity of Neural Text Generation Via Inverse Probability Weighting

A Simple, Fast Diverse Decoding Algorithm for Neural Generation

Closing the Curious Case of Neural Text Degeneration

Diversifying Neural Text Generation with Part-of-Speech Guided Softmax and Sampling

The Curious Case of Neural Text Degeneration

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation

Lingxi: A Diversity-aware Chinese Modern Poetry Generation System

DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text

Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding

A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation

Improving Open-Ended Text Generation via Adaptive Decoding

Differentiated Distribution Recovery for Neural Text Generation

REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy

Informed Sampling for Diversity in Concept-to-Text NLG

Efficient and Training-Free Control of Language Generation

Controllable Text Generation for Open-Domain Creativity and Fairness

Open-Sampling: Exploring Out-of-Distribution Data for Re-balancing Long-tailed Datasets

Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation