Abstract:Decoding strategies for large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Since LLMs produce probability distributions over the entire vocabulary, various decoding methods have been developed to transform these probabilities into coherent and fluent text, each with its own set of hyperparameters. In this study, we present a large-scale, comprehensive analysis of how hyperparameter selection affects text quality in open-ended text generation across multiple LLMs, datasets, and evaluation metrics. Through an extensive sensitivity analysis, we provide practical guidelines for hyperparameter tuning and demonstrate the substantial influence of these choices on text quality. Using three established datasets, spanning factual domains (e.g., news) and creative domains (e.g., fiction), we show that hyperparameter tuning significantly impacts generation quality, though its effects vary across models and tasks. We offer in-depth insights into these effects, supported by both human evaluations and a synthesis of widely-used automatic evaluation metrics.

What problem does this paper attempt to address?

The paper attempts to address the issue of how the choice of hyperparameters in decoding strategies significantly affects the quality of generated text in open-ended text generation tasks. Specifically, through large-scale and comprehensive analysis, the paper explores the impact of hyperparameter choices on text quality across different large language models (LLMs), datasets, and evaluation metrics, and provides practical hyperparameter tuning guidelines. ### Main Research Questions: 1. **Impact of Hyperparameter Choices on Text Quality**: The paper investigates how hyperparameters in different decoding strategies affect the coherence and diversity of generated text. 2. **Effectiveness of Hyperparameters Across Different Models and Tasks**: It explores the differences in the effectiveness of hyperparameter choices across different models and tasks (e.g., news generation, story creation). 3. **Systematic Evaluation and Tuning Guidelines**: Through extensive sensitivity analysis, it provides systematic hyperparameter tuning guidelines to optimize the quality of generated text. ### Research Background: - **Importance of Decoding Strategies**: Large language models (LLMs) generate high-dimensional probability distributions that need to be converted into natural language text through decoding strategies. The choice of different decoding strategies and their hyperparameters significantly impacts the quality of the generated text. - **Insufficiency of Existing Research**: Despite the critical importance of decoding strategies and their hyperparameter choices for text quality, this area remains under-researched. Users often rely on default settings or focus solely on model performance, neglecting the optimization of decoding strategies. ### Research Methods: - **Experimental Design**: Experiments were conducted using seven different models (e.g., GPT2-XL, Mistral 7B, Llama 3.1, etc.) on three different datasets (news, Wikipedia, stories) through six decoding strategies (e.g., beam search, contrastive search, sampling, etc.). - **Evaluation Metrics**: A combination of automatic evaluation metrics (e.g., coherence, diversity, MAUVE) and human evaluation was used to comprehensively assess the quality of the generated text. ### Main Contributions: 1. **Large-Scale Sensitivity Analysis**: Conducted large-scale sensitivity analysis to systematically evaluate the impact of different decoding strategies and their hyperparameters on text quality. 2. **Practical Tuning Guidelines**: Provided practical hyperparameter tuning guidelines to help researchers and practitioners choose appropriate decoding strategies and hyperparameters. 3. **Publicly Available Generated Text Data**: Generated 2.2 million text continuations and made the data and code publicly available for future research use. ### Conclusion: - **Balancing Coherence and Diversity**: The study shows that successful text generation requires balancing coherence and diversity, as overemphasizing one aspect can lead to a decline in overall performance. - **Importance of Hyperparameter Choices**: The choice of hyperparameters has a significant impact on the quality of generated text, sometimes even more so than the scale of the model. - **Future Research Directions**: Future research can further explore the applicability of these methods in other NLP tasks (e.g., summarization, machine translation) and their performance in multilingual settings. Through these studies, the paper provides important theoretical and practical guidance for the choice of decoding strategies in open-ended text generation tasks.

Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation

The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation

LLM can Achieve Self-Regulation via Hyperparameter Aware Generation

Improving Open-Ended Text Generation via Adaptive Decoding

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation

A Thorough Examination of Decoding Methods in the Era of LLMs

Optimizing Large Language Model Hyperparameters for Code Generation

Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

Unused information in token probability distribution of generative LLM: improving LLM reading comprehension through calculation of expected values

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation

Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation

Balancing Cost and Effectiveness of Synthetic Data Generation Strategies for LLMs

Extrapolating Multilingual Understanding Models as Multilingual Generators