A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Jiaao Chen,Aston Zhang,Mu Li,Alex Smola,Diyi Yang

2023-04-11

Abstract:Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the challenges encountered when applying diffusion models in language modeling. Specifically, existing diffusion models based on continuous data have limitations when dealing with discrete data such as text. For example, commonly used Gaussian noise does not handle discrete text perturbations well, and the objective function in high-dimensional space is not stable enough for text data during the diffusion process. To solve these issues, the authors propose a new diffusion model—Masked-Diffuse LM, which incorporates linguistic features to improve the quality and efficiency of text generation. Specifically, Masked-Diffuse LM achieves this through the following methods: 1. **Soft Masking Strategy**: Utilizes a forward process based on linguistic features, strategically soft-masking the input text to better handle text noise. 2. **Direct Mapping of Continuous and Discrete Spaces**: Directly predicts category distributions at each diffusion step, connecting continuous and discrete spaces through a cross-entropy loss function, thereby generating text more efficiently. Experimental results show that Masked-Diffuse LM achieves better performance than existing diffusion models on multiple controllable generation tasks, with higher training and inference efficiency. Additionally, this method can better integrate large-scale pre-trained language models (such as BERT), further enhancing generation quality.

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Multimodal Latent Language Modeling with Next-Token Diffusion

Simple and Effective Masked Diffusion Language Models

Simplified and Generalized Masked Diffusion for Discrete Data

Think While You Generate: Discrete Diffusion with Planned Denoising

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

Masked Diffusion Models Are Fast Distribution Learners

DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space

Denoising Diffusion Step-aware Models

Diffusion-LM Improves Controllable Text Generation

Latent Diffusion for Language Generation

Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

Diffusion Guided Language Modeling

Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

PLM-Based Discrete Diffusion Language Models with Entropy-Adaptive Gibbs Sampling

Energy-Based Diffusion Language Models for Text Generation

Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation

Scaling Diffusion Language Models via Adaptation from Autoregressive Models