BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

Fangyikang Wang,Hubery Yin,Yuejiang Dong,Huminhao Zhu,Chao Zhang,Hanbin Zhao,Hui Qian,Chen Li
2024-10-09
Abstract:The inversion of diffusion model sampling, which aims to find the corresponding initial noise of a sample, plays a critical role in various tasks. Recently, several heuristic exact inversion samplers have been proposed to address the inexact inversion issue in a training-free manner. However, the theoretical properties of these heuristic samplers remain unknown and they often exhibit mediocre sampling quality. In this paper, we introduce a generic formulation, \emph{Bidirectional Explicit Linear Multi-step} (BELM) samplers, of the exact inversion samplers, which includes all previously proposed heuristic exact inversion samplers as special cases. The BELM formulation is derived from the variable-stepsize-variable-formula linear multi-step method via integrating a bidirectional explicit constraint. We highlight this bidirectional explicit constraint is the key of mathematically exact inversion. We systematically investigate the Local Truncation Error (LTE) within the BELM framework and show that the existing heuristic designs of exact inversion samplers yield sub-optimal LTE. Consequently, we propose the Optimal BELM (O-BELM) sampler through the LTE minimization approach. We conduct additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler. Comprehensive experiments demonstrate our O-BELM sampler establishes the exact inversion property while achieving high-quality sampling. Additional experiments in image editing and image interpolation highlight the extensive potential of applying O-BELM in varying applications.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of imprecise inversion in the sampling process of Diffusion Models (DMs). Specifically, when generating samples of data distribution, diffusion models generate samples from the initial noise by learning a reverse diffusion process. However, in practical applications, existing sampling methods (such as DDIM) often have problems of inconsistency and imprecision when trying to reverse - infer the initial noise from samples. This imprecise inversion will lead to a decline in the quality of tasks such as image reconstruction and editing. To address this challenge, the author proposes a new framework - Bidirectional Explicit Linear Multi - step Sampler (BELM). BELM aims to achieve mathematically exact inversion by introducing a unified relationship and does not require additional training. In addition, the author also designs the optimal BELM sampler (Optimal - BELM, O - BELM) by minimizing the Local Truncation Error (LTE) to ensure higher sampling accuracy and stability. ### Main Contributions 1. **Propose the BELM framework**: This framework encompasses all existing heuristic exact - inversion samplers as special cases and achieves mathematically exact inversion by introducing bidirectional explicit constraints. 2. **Design the O - BELM sampler**: By minimizing the local truncation error, an optimal sampler with higher - order local error is designed. 3. **Theoretical guarantee**: Provide theoretical guarantees for the global stability and convergence of O - BELM. 4. **Experimental verification**: Through multiple experiments (including image reconstruction, unconditional and conditional image generation, image editing, etc.), it is verified that O - BELM can not only achieve exact inversion but also maintain high - quality sampling. ### Formula Summary - **General form of BELM sampler**: \[ \bar{x}_{i - 1}=\sum_{j = 1}^{k}a_{i,j}\cdot\bar{x}_{i - 1 + j}+\sum_{j = 1}^{k - 1}b_{i,j}\cdot h_{i - 1 + j}\cdot\bar{\varepsilon}_{\theta}(\bar{x}_{i - 1 + j},\bar{\sigma}_{i - 1 + j}) \] where $\bar{x}(t)=\frac{x(t)}{\alpha_t}$, $\bar{\sigma}(t)=\frac{\sigma_t}{\alpha_t}$, $\bar{\varepsilon}_{\theta}(\bar{x}(t),\bar{\sigma}_t)=\varepsilon_{\theta}(x(t),t)$. - **Specific formula of O - BELM sampler**: \[ x_{i - 1}=\frac{h_i^2}{h_{i + 1}^2}\frac{\alpha_{i - 1}}{\alpha_{i + 1}}x_{i + 1}+\frac{h_{i + 1}^2 - h_i^2}{h_{i + 1}^2}\frac{\alpha_{i - 1}}{\alpha_i}x_i-\frac{h_i(h_i + h_{i + 1})}{h_{i + 1}}\alpha_{i - 1}\varepsilon_{\theta}(x_i,i) \] Through these improvements, O - BELM shows significant advantages both theoretically and experimentally, especially in terms of exact inversion and high - quality sampling.