Language modeling via stochastic processes

Rose E Wang,Esin Durmus,Noah Goodman,Tatsunori Hashimoto

DOI: https://doi.org/10.48550/arXiv.2203.11370

2023-05-11

Abstract:Modern language models can generate high-quality short texts. However, they often meander or are incoherent when generating longer texts. These issues arise from the next-token-only language modeling objective. Recent work in self-supervised learning suggests that models can learn good latent representations via contrastive learning, which can be effective for discriminative tasks. Our work analyzes the application of contrastive representations for generative tasks, like long text generation. We propose one approach for leveraging constrastive representations, which we call Time Control (TC). TC first learns a contrastive representation of the target text domain, then generates text by decoding from these representations. Compared to domain-specific methods and fine-tuning GPT2 across a variety of text domains, TC performs competitively to methods specific for learning sentence representations on discourse coherence. On long text generation settings, TC preserves the text structure both in terms of ordering (up to $+15\%$ better) and text length consistency (up to $+90\%$ better).

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the issue of existing language models lacking coherence and structure when generating long texts. Specifically: 1. **Limitations of existing language models**: While modern language models (such as GPT-2) perform well in generating short texts, they often go off-topic or become incoherent when generating long texts. These issues mainly stem from these models relying solely on the next word's language modeling objective, which fails to effectively capture the dynamic changes in long texts. 2. **Challenges in generating long texts**: Existing self-supervised learning methods, although capable of learning good latent representations through contrastive learning, are primarily suited for discriminative tasks and have limited effectiveness for generative tasks (such as long text generation). Additionally, existing planning-based methods usually require manually defining text dynamics for specific domains, which limits their generalization ability. 3. **Goal-oriented generation problem**: Existing autoregressive models struggle with goal-oriented generation when producing long texts, leading to a lack of global coherence in the generated text. For example, these models often deviate from the expected endpoint when generating long texts, resulting in a chaotic text structure. To address the above issues, the paper proposes a new method—Time Control (TC), which generates long texts with better global coherence by leveraging contrastive learning and Brownian bridge dynamics. Specifically, TC first learns a latent space of contrastive representations and then decodes text generation from this latent space. Experimental results show that TC can better maintain the sequential order and length consistency of the text structure when generating long texts.

Language modeling via stochastic processes

Multimodal Latent Language Modeling with Next-Token Diffusion

Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series

Controllable Text Generation with Language Constraints

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning

Backward and Forward Language Modeling for Constrained Sentence Generation

Learning to Generate Text in Arbitrary Writing Styles

Contrastive learning of strong-mixing continuous-time stochastic processes

Language Modeling with Generative Adversarial Networks

Learning to Plan Long-Term for Language Modeling

Advancing Time Series Classification with Multimodal Language Modeling

Teaching Others is Teaching Yourself Regularization For Controllable Language Models

Toward Controlled Generation of Text

Language Model Evaluation Beyond Perplexity

Controllable Natural Language Generation with Contrastive Prefixes

A Temporal Variational Model for Story Generation

On the Sequence Evaluation based on Stochastic Processes

Controllable Text Generation for Open-Domain Creativity and Fairness

Markovian Transformers for Informative Language Modeling

Counterfactual Token Generation in Large Language Models