Abstract:In many domains, autoregressive models can attain high likelihood on the task of predicting the next observation. However, this maximum-likelihood (MLE) objective does not necessarily match a downstream use-case of autoregressively generating high-quality sequences. The MLE objective weights sequences proportionally to their frequency under the data distribution, with no guidance for the model's behaviour out of distribution (OOD): leading to compounding error during autoregressive generation. In order to address this compounding error problem, we formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset, including divergences with weight on OOD generated sequences. The IL framework also allows us to incorporate backtracking by introducing a backspace action into the generation process. This further mitigates the compounding error problem by allowing the model to revert a sampled token if it takes the sequence OOD. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes. We identify the SequenceMatch-$\chi^2$ divergence as a more suitable training objective for autoregressive models which are used for generation. We show that empirically, SequenceMatch training leads to improvements over MLE on text generation with language models and arithmetic.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the mismatch between the maximum - likelihood estimation (MLE) objective and the requirement for high - quality sequence generation in practical applications in the autoregressive sequence generation task. Specifically, although autoregressive models can achieve high likelihood in the task of predicting the next observation, this maximum - likelihood objective does not necessarily promote the generation of high - quality sequences, especially when cumulative errors occur during the generation process. These cumulative errors cause the model to gradually deviate from the data distribution, resulting in low - quality or meaningless output. To solve this problem, the authors propose a new method - SequenceMatch, which models the sequence generation problem as an imitation learning (IL) problem. By minimizing various divergences between the generated sequence distribution and the sequence distribution in the dataset, especially those divergences that weight out - of - distribution (OOD) generated sequences, the impact of cumulative errors is reduced. In addition, SequenceMatch introduces a "backspace" action, allowing the model to undo a wrong token during the generation process, further alleviating the cumulative error problem. In summary, the main contributions of this paper are as follows: 1. Redefine the sequence generation problem as an imitation learning problem and propose a general non - adversarial objective function for minimizing multiple divergences based on occupancy measures. 2. Develop a new masking scheme that enables Transformer - based autoregressive models to be trained with the ability of backspace action without additional overhead. 3. Verify through experiments that the SequenceMatch - trained model outperforms the maximum - likelihood objective in text generation and arithmetic tasks.

SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking

Time-series Generation by Contrastive Imitation

An Imitation Learning Curriculum for Text Editing with Non-Autoregressive Models

Dirichlet Flow Matching with Applications to DNA Sequence Design

On the Sequence Evaluation based on Stochastic Processes

A Study of Non-autoregressive Model for Sequence Generation

CASR: Generating Complex Sequences with Autoregressive Self-Boost Refinement

Diffusing States and Matching Scores: A New Framework for Imitation Learning

Isotropy-Enhanced Conditional Masked Language Models

Adversarial Subsequences for Unconditional Text Generation

Text Matching With Monte Carlo Tree Search

An Actor-Critic Algorithm for Sequence Prediction

Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines

Curriculum-Based Neighborhood Sampling For Sequence Prediction

Matching Seqlets: an Unsupervised Approach for Locality Preserving Sequence Matching

State-only Imitation with Transition Dynamics Mismatch

Model-Based Oversampling for Imbalanced Sequence Classification

A Non-monotonic Self-terminating Language Model

Imitating Language via Scalable Inverse Reinforcement Learning

Multiple pattern matching: A Markov chain approach

Deconvolutional Latent-Variable Model for Text Sequence Matching