Discrete Copula Diffusion

Anji Liu,Oliver Broadrick,Mathias Niepert,Guy Van den Broeck
2024-10-03
Abstract:Discrete diffusion models have recently shown significant progress in modeling complex data, such as natural languages and DNA sequences. However, unlike diffusion models for continuous data, which can generate high-quality samples in just a few denoising steps, modern discrete diffusion models still require hundreds or even thousands of denoising steps to perform well. In this paper, we identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps -- they fail to capture dependencies between output variables at each denoising step. To address this issue, we provide a formal explanation and introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps. When we apply this approach to autoregressive copula models, the combined model outperforms both models individually in unconditional and conditional text generation. Specifically, the hybrid model achieves better (un)conditional text generation using 8 to 32 times fewer denoising steps than the diffusion model alone. In addition to presenting an effective discrete diffusion generation algorithm, this paper emphasizes the importance of modeling inter-variable dependencies in discrete diffusion.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of excessive denoising steps required to generate high-quality samples in discrete diffusion models. Specifically, compared to continuous data diffusion models, existing discrete diffusion models require hundreds or even thousands of denoising steps to produce high-quality samples. The paper points out that the root cause of this phenomenon is that these models fail to capture the dependencies between output variables in each denoising step. Therefore, the authors propose a new approach by introducing another deep generative model (referred to as the copula model) to supplement the missing dependency information, thereby achieving high-quality sample generation with fewer denoising steps. This method does not require fine-tuning of either the diffusion model or the copula model, yet it can generate high-quality samples with significantly reduced denoising steps. Moreover, when this method is applied to autoregressive copula models, the combined model outperforms either model used alone in both unconditional and conditional text generation tasks. Specifically, with 8 to 32 times fewer denoising steps, the hybrid model achieves better (un)conditional text generation results.