A Survey of Diffusion Models in Natural Language Processing

Hao Zou,Zae Myung Kim,Dongyeop Kang
DOI: https://doi.org/10.48550/arXiv.2305.14671
2023-06-15
Abstract:This survey paper provides a comprehensive review of the use of diffusion models in natural language processing (NLP). Diffusion models are a class of mathematical models that aim to capture the diffusion of information or signals across a network or manifold. In NLP, diffusion models have been used in a variety of applications, such as natural language generation, sentiment analysis, topic modeling, and machine translation. This paper discusses the different formulations of diffusion models used in NLP, their strengths and limitations, and their applications. We also perform a thorough comparison between diffusion models and alternative generative models, specifically highlighting the autoregressive (AR) models, while also examining how diverse architectures incorporate the Transformer in conjunction with diffusion models. Compared to AR models, diffusion models have significant advantages for parallel generation, text interpolation, token-level controls such as syntactic structures and semantic contents, and robustness. Exploring further permutations of integrating Transformers into diffusion models would be a valuable pursuit. Also, the development of multimodal diffusion models and large-scale diffusion language models with notable capabilities for few-shot learning would be important directions for the future advance of diffusion models in NLP.
Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the application and challenges of diffusion models in natural language processing (NLP). Specifically, the paper focuses on the following aspects: 1. **Applications of diffusion models in NLP**: - Diffusion models are applied to various NLP tasks, such as natural language generation, sentiment analysis, topic modeling, and machine translation. - The paper discusses the specific methods and effects of these applications. 2. **Mathematical framework of diffusion models**: - The basic frameworks of diffusion models in continuous state space and discrete state space are introduced. - The mathematical representations of the forward process and the reverse process are described, including the detailed formulas of the noise - adding and denoising processes. 3. **Comparison between discrete diffusion models and embedding diffusion models**: - Discrete diffusion models operate directly in the discrete input space, by performing data destruction and restoration at the token level. - Embedding diffusion models encode discrete texts into a continuous space, and then add and remove Gaussian noise. 4. **Comparison between diffusion models and other generative models**: - Compared with autoregressive (AR) models, diffusion models have significant advantages in parallel generation, text interpolation, token - level control, and robustness to input noise. - Compared with latent variable models (such as VAE and flow - based models), diffusion models adopt a fixed procedure when generating data, and the dimension of the latent variable is the same as that of the original data. 5. **Future research directions**: - Explore how to further combine the Transformer architecture with diffusion models to improve the performance of the models. - Develop multimodal diffusion models and large - scale diffusion language models, especially models with few - shot learning capabilities. ### Formula summary - **Forward process**: \[ q(x_t | x_{t - 1})=\mathcal{N}(x_t; \sqrt{1-\beta_t}\cdot x_{t - 1}; \beta_tI) \] \[ x_t=\sqrt{\alpha_t}x_{t - 1}+\sqrt{1-\alpha_t}z_{t - 1} \] \[ q(x_{1:T}|x_0)=\prod_{t = 1}^Tq(x_t|x_{t - 1}) \] \[ q(x_T|x_0)=\mathcal{N}(x_t; \sqrt{\bar{\alpha}_t}x_0; (1-\bar{\alpha}_t)I) \] - **Reverse process**: \[ p_\theta(x_{t - 1}|x_t)=\mathcal{N}(x_{t - 1}; \mu(x_t, t), \Sigma_\theta(x_t, t)) \] \[ p_\theta(x_{0:T})=p(x_T)\prod_{t = 1}^T p_\theta(x_{t - 1}|x_t) \] - **Transition probability of discrete diffusion models**: \[ q(x_t|x_{t - 1})=\text{Cat}(x_t; p = x_{t - 1}Q_t) \] \[ q(x_t|x_0)=\text{Cat}(x_t; p = x_0\bar{Q}_t) \] \[ q(x_{t - 1}|x_t, x_0)=\frac{q(x_t|x_{t - 1}, x_0)q(x_{t - 1}|x_0)}{q(x_t|x_0)} \] Through these formulas and methods, the paper systematically reviews the application and development of diffusion models in the NLP field and points out future research directions.