Abstract:Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Superposed Decoding can also be combined with other decoding strategies, resulting in universal coverage gains when scaling inference time compute. Code and more examples open-sourced at <a class="link-external link-https" href="https://github.com/RAIVNLab/SuperposedDecoding" rel="external noopener nofollow">this https URL</a>.

A Simple, Fast Diverse Decoding Algorithm for Neural Generation

Learning to Decode for Future Success

Mutual Information and Diverse Decoding Improve Neural Machine Translation.

Deep Reinforcement Learning for Dialogue Generation

Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

IFDID: Information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG

Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding

Decoding and Diversity in Machine Translation

Diversifying Neural Conversation Model with Maximal Marginal Relevance.

Improving Diversity of Neural Text Generation Via Inverse Probability Weighting

Informed Sampling for Diversity in Concept-to-Text NLG

Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation

MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation.

A Frustratingly Simple Decoding Method for Neural Text Generation

Differentiated Distribution Recovery for Neural Text Generation

Towards a Neural Conversation Model With Diversity Net Using Determinantal Point Processes.

Improving Open-Ended Text Generation via Adaptive Decoding

Learning to Diversify Neural Text Generation via Degenerative Model