Tongda Xu,Ziran Zhu,Jian Li,Dailan He,Yuanyuan Wang,Ming Sun,Ling Li,Hongwei Qin,Yan Wang,Jingjing Liu,Ya-Qin Zhang
Abstract:Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_{\theta}(X_0|y)$, with a predefined diffusion model $p_{\theta}(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'_0)$ derived from an unknown image $x'_0$. Existing DIS estimate the conditional score function by evaluating $f(\cdot)$ with an approximated posterior sample drawn from $p_{\theta}(X_0|X_t)$. However, most prior approximations rely on the posterior means, which may not lie in the support of the image distribution, thereby potentially diverge from the appearance of genuine images. Such out-of-support samples may significantly degrade the performance of the operator $f(\cdot)$, particularly when it is a neural network. In this paper, we introduces a novel approach for posterior approximation that guarantees to generate valid samples within the support of the image distribution, and also enhances the compatibility with neural network-based operators $f(\cdot)$. We first demonstrate that the solution of the Probability Flow Ordinary Differential Equation (PF-ODE) with an initial value $x_t$ yields an effective posterior sample $p_{\theta}(X_0|X_t=x_t)$. Based on this observation, we adopt the Consistency Model (CM), which is distilled from PF-ODE, for posterior sampling. Furthermore, we design a novel family of DIS using only CM. Through extensive experiments, we show that our proposed method for posterior sample approximation substantially enhance the effectiveness of DIS for neural network operators $f(\cdot)$ (e.g., in semantic segmentation). Additionally, our experiments demonstrate the effectiveness of the new CM-based inversion techniques. The source code is provided in the supplementary material.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the posterior sample approximation problem in Diffusion Inverse Solvers (DIS). Specifically, existing DIS methods usually use approximate samples drawn from the posterior distribution \(p_\theta(X_0 | X_t)\) when estimating the conditional score function. However, most of these approximate samples rely on the posterior mean, and these means may not be within the support domain of the image distribution, resulting in the generated samples being quite different from the real images. Especially when the operator \(f(\cdot)\) is a neural network, this difference will significantly degrade the performance.
### Main contributions of the paper
1. **Propose a new posterior sample approximation method**:
- The authors first show that the solution of the Probability Flow Ordinary Differential Equation (PF - ODE) can be used as an effective posterior sample \(p_\theta(X_0 | X_t = x_t)\).
- Based on this observation, the authors introduce the Consistency Model (CM), which is extracted from PF - ODE and used for posterior sample approximation.
2. **Design a new family of DIS**:
- The authors design a new DIS method that uses only CM for posterior sample approximation. Through experimental verification, this method performs significantly better than existing methods on neural network operators (such as semantic segmentation).
3. **Experimental verification**:
- Through extensive experiments, the authors demonstrate the effectiveness of the proposed posterior sample approximation method in improving DIS performance, especially more prominent on neural network operators.
### Formulas and key concepts
- **PF - ODE**:
\[
\text{PF - ODE: } \frac{dX_t}{dt}=-\frac{1}{2} \frac{d\sigma^2_t}{dt} s_\theta(t, X_t) dt
\]
where \(s_\theta(t, X_t)\) is the score function and \(\sigma^2_t\) is the variance scheduling parameter.
- **Posterior mean**:
\[
E[X_0 | X_t]=X_t+\sigma^2_t s_\theta(t, X_t)
\]
- **CM approximation**:
\[
x_0 | t = g_\theta(t, x_t)+\mathcal{N}(0, \tau^2)
\]
where \(g_\theta(t, x_t)\) is the one - step neural function obtained by CM training, and \(\mathcal{N}(0, \tau^2)\) is the added small Gaussian noise to improve robustness.
### Experimental results
- **Neural network operators**:
- On tasks such as semantic segmentation, room layout estimation, image captioning, and image classification, the proposed CM approximation method significantly outperforms baseline methods (such as DPS) in terms of consistency and sample quality.
- **Non - neural network operators**:
- For simple down - sampling operators, the proposed method also shows good performance, although its advantage is not as obvious as on neural network operators.
### Conclusion
This paper significantly improves the performance of Diffusion Inverse Solvers (DIS) on neural network operators by introducing the Consistency Model (CM) as a posterior sample approximation method. The experimental results show that the CM approximation method not only generates effective posterior samples but also performs well on multiple tasks. Future work can further explore the application of CM in larger images and latent diffusion models.