Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Yunshu Wu,Yingtao Luo,Xianghao Kong,Evangelos E. Papalexakis,Greg Ver Steeg
2024-07-12
Abstract:Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution. We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.
Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores the issues present in diffusion models when generating samples and proposes a new training objective to improve model performance. #### Core Issues: 1. **Degradation of Sample Quality**: During the denoising process, diffusion models perform poorly in regions far from the training distribution (Out Of Distribution, OOD). This leads to a decline in the quality of generated samples. 2. **Issues with Parallel Sampling**: Although parallel sampling methods can significantly reduce generation time, they inevitably enter OOD regions when initializing and updating the entire sample trajectory, thereby affecting the generation quality. #### Solutions: 1. **Self-Supervised Training Objective**: The authors introduce a new self-supervised training objective—Contrastive Diffusion Loss (CDL), which improves denoising performance in OOD regions by distinguishing samples with different noise levels. 2. **Theoretical Foundation**: The authors discover that diffusion models implicitly define a Log-Likelihood Ratio (LLR) that can distinguish samples with different noise levels. By optimizing this ratio, the generation quality and speed can be improved. #### Main Contributions: 1. **Information-Theoretic Connection**: Using information-theoretic methods, the authors demonstrate that the optimal denoiser is also the optimal classifier for predicting the amount of image noise. 2. **New Training Objective**: The paper proposes the Contrastive Diffusion Loss (CDL), which provides additional training signals beyond the standard Mean Squared Error (MSE) loss, especially in OOD regions. 3. **Experimental Validation**: Various experiments validate the effectiveness of CDL, particularly in parallel sampling scenarios, significantly improving generation speed and sample quality. In summary, the paper aims to address the issues in parallel sampling by improving the training methods of diffusion models, thereby enhancing the quality and efficiency of generated samples.