Language modeling via stochastic processes

Rose E Wang,Esin Durmus,Noah Goodman,Tatsunori Hashimoto
DOI: https://doi.org/10.48550/arXiv.2203.11370
2023-05-11
Abstract:Modern language models can generate high-quality short texts. However, they often meander or are incoherent when generating longer texts. These issues arise from the next-token-only language modeling objective. Recent work in self-supervised learning suggests that models can learn good latent representations via contrastive learning, which can be effective for discriminative tasks. Our work analyzes the application of contrastive representations for generative tasks, like long text generation. We propose one approach for leveraging constrastive representations, which we call Time Control (TC). TC first learns a contrastive representation of the target text domain, then generates text by decoding from these representations. Compared to domain-specific methods and fine-tuning GPT2 across a variety of text domains, TC performs competitively to methods specific for learning sentence representations on discourse coherence. On long text generation settings, TC preserves the text structure both in terms of ordering (up to $+15\%$ better) and text length consistency (up to $+90\%$ better).
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of existing language models lacking coherence and structure when generating long texts. Specifically: 1. **Limitations of existing language models**: While modern language models (such as GPT-2) perform well in generating short texts, they often go off-topic or become incoherent when generating long texts. These issues mainly stem from these models relying solely on the next word's language modeling objective, which fails to effectively capture the dynamic changes in long texts. 2. **Challenges in generating long texts**: Existing self-supervised learning methods, although capable of learning good latent representations through contrastive learning, are primarily suited for discriminative tasks and have limited effectiveness for generative tasks (such as long text generation). Additionally, existing planning-based methods usually require manually defining text dynamics for specific domains, which limits their generalization ability. 3. **Goal-oriented generation problem**: Existing autoregressive models struggle with goal-oriented generation when producing long texts, leading to a lack of global coherence in the generated text. For example, these models often deviate from the expected endpoint when generating long texts, resulting in a chaotic text structure. To address the above issues, the paper proposes a new method—Time Control (TC), which generates long texts with better global coherence by leveraging contrastive learning and Brownian bridge dynamics. Specifically, TC first learns a latent space of contrastive representations and then decodes text generation from this latent space. Experimental results show that TC can better maintain the sequential order and length consistency of the text structure when generating long texts.