Abstract:In this paper we identify the source of a singularity in the training loss of key denoising models, that causes the denoiser's predictions to collapse towards the mean of the source or target distributions. This degeneracy creates false basins of attraction, distorting the denoising trajectories and ultimately increasing the number of steps required to sample these models.
We circumvent this artifact by leveraging the deterministic ODE-based samplers, offered by certain denoising diffusion and score-matching models, which establish a well-defined change-of-variables between the source and target distributions. Given this correspondence, we propose a new probability flow model, the Lines Matching Model (LMM), which matches globally straight lines interpolating the two distributions. We demonstrate that the flow fields produced by the LMM exhibit notable temporal consistency, resulting in trajectories with excellent straightness scores.
Beyond its sampling efficiency, the LMM formulation allows us to enhance the fidelity of the generated samples by integrating domain-specific reconstruction and adversarial losses, and by optimizing its training for the sampling procedure used. Overall, the LMM achieves state-of-the-art FID scores with minimal NFEs on established benchmark datasets: 1.57/1.39 (NFE=1/2) on CIFAR-10, 1.47/1.17 on ImageNet 64x64, and 2.68/1.54 on AFHQ 64x64.
Finally, we provide a theoretical analysis showing that the use of optimal transport to relate the two distributions suffers from a curse of dimensionality, where the pairing set size (mini-batch) must scale exponentially with the signal dimension.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve a key problem encountered in the denoising process of diffusion models, namely the **singularity problem in training loss**. Specifically, this problem causes the prediction results of the denoising model to tend towards the mean of the source or target distribution, thus creating false basins of attraction, distorting the denoising trajectory, and ultimately increasing the number of steps required to sample these models.
#### Specific manifestations of the problem:
1. **Denoising model degradation**: Under low signal - to - noise ratio (SNR) conditions, the uncertainty of the denoising loss intensifies, causing the denoising predictor's results to tend towards the mean of the source or target distribution.
2. **False basins of attraction**: This degradation phenomenon can cause the denoising trajectory to bend and distort, increasing the number of steps required for accurate sampling.
3. **Low computational efficiency**: Due to the need for more sampling steps, the computational cost increases significantly.
#### Solution:
To overcome this problem, the authors propose a new probability flow model - **Lines Matching Model (LMM)**. This model improves existing methods in the following ways:
1. **Utilizing deterministic ODE samplers**: LMM utilizes the deterministic ODE samplers provided by certain denoising diffusion models and score - matching models to establish an explicit variable transformation relationship between the source and target distributions.
2. **Global straight - line interpolation**: LMM performs global straight - line interpolation between the two distributions to ensure that the generated flow field has significant temporal consistency, thus producing trajectories with good straightness.
3. **Enhancing sample fidelity**: LMM improves the quality of generated samples by integrating domain - specific reconstruction losses and adversarial losses and optimizing its training to adapt to the sampling process used.
4. **Theoretical analysis**: The authors also provide theoretical analysis, showing that using Optimal Transport (OT) to correlate the two distributions is affected by the curse of dimensionality, where the mini - batch size must increase exponentially with the signal dimension.
#### Experimental results:
LMM has achieved state - of - the - art FID scores on multiple benchmark datasets and has achieved efficient sampling with the minimum NFEs (Number of Function Evaluations). For example, on the CIFAR - 10 dataset, LMM has reached FID scores of 1.57 and 1.39 at NFE = 1 and NFE = 2 respectively.
In conclusion, this paper solves the singularity problem in the denoising process of diffusion models by proposing LMM, improving the sampling efficiency and the quality of generated samples.