SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking

Chris Cundy,Stefano Ermon
2024-05-07
Abstract:In many domains, autoregressive models can attain high likelihood on the task of predicting the next observation. However, this maximum-likelihood (MLE) objective does not necessarily match a downstream use-case of autoregressively generating high-quality sequences. The MLE objective weights sequences proportionally to their frequency under the data distribution, with no guidance for the model's behaviour out of distribution (OOD): leading to compounding error during autoregressive generation. In order to address this compounding error problem, we formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset, including divergences with weight on OOD generated sequences. The IL framework also allows us to incorporate backtracking by introducing a backspace action into the generation process. This further mitigates the compounding error problem by allowing the model to revert a sampled token if it takes the sequence OOD. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes. We identify the SequenceMatch-$\chi^2$ divergence as a more suitable training objective for autoregressive models which are used for generation. We show that empirically, SequenceMatch training leads to improvements over MLE on text generation with language models and arithmetic.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the mismatch between the maximum - likelihood estimation (MLE) objective and the requirement for high - quality sequence generation in practical applications in the autoregressive sequence generation task. Specifically, although autoregressive models can achieve high likelihood in the task of predicting the next observation, this maximum - likelihood objective does not necessarily promote the generation of high - quality sequences, especially when cumulative errors occur during the generation process. These cumulative errors cause the model to gradually deviate from the data distribution, resulting in low - quality or meaningless output. To solve this problem, the authors propose a new method - SequenceMatch, which models the sequence generation problem as an imitation learning (IL) problem. By minimizing various divergences between the generated sequence distribution and the sequence distribution in the dataset, especially those divergences that weight out - of - distribution (OOD) generated sequences, the impact of cumulative errors is reduced. In addition, SequenceMatch introduces a "backspace" action, allowing the model to undo a wrong token during the generation process, further alleviating the cumulative error problem. In summary, the main contributions of this paper are as follows: 1. Redefine the sequence generation problem as an imitation learning problem and propose a general non - adversarial objective function for minimizing multiple divergences based on occupancy measures. 2. Develop a new masking scheme that enables Transformer - based autoregressive models to be trained with the ability of backspace action without additional overhead. 3. Verify through experiments that the SequenceMatch - trained model outperforms the maximum - likelihood objective in text generation and arithmetic tasks.