ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval

Kexun Zhang,Xianjun Yang,William Yang Wang,Lei Li
2023-10-26
Abstract:Diffusion models show promising generation capability for a variety of data. Despite their high generation quality, the inference for diffusion models is still time-consuming due to the numerous sampling iterations required. To accelerate the inference, we propose ReDi, a simple yet learning-free Retrieval-based Diffusion sampling framework. From a precomputed knowledge base, ReDi retrieves a trajectory similar to the partially generated trajectory at an early stage of generation, skips a large portion of intermediate steps, and continues sampling from a later step in the retrieved trajectory. We theoretically prove that the generation performance of ReDi is guaranteed. Our experiments demonstrate that ReDi improves the model inference efficiency by 2x speedup. Furthermore, ReDi is able to generalize well in zero-shot cross-domain image generation such as image stylization.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accelerate the inference process of diffusion models while maintaining the generation quality. Specifically, although diffusion models perform well on multiple data types, their inference process is very time - consuming, mainly because a large number of sampling iterations are required. To solve this problem, the authors propose REDi (Retrieval - based Diffusion sampling), a learning - free framework based on trajectory retrieval. REDi reduces the number of function evaluations (NFEs) and speeds up the inference by retrieving trajectories similar to the partially generated trajectories from the pre - computed knowledge base and skipping a large number of intermediate steps. ### Main contributions: 1. **Propose REDi**: A learning - free framework based on trajectory retrieval for accelerating the inference process of diffusion models. REDi reduces the number of function evaluations by skipping some intermediate steps. 2. **Theoretical guarantee**: Prove that the generation quality of REDi has a theoretical bound, and this bound is well - correlated with the actual performance. 3. **Experimental verification**: Through experiments, it is proved that REDi can not only improve the inference efficiency, but also perform well in zero - shot domain adaptation. ### Key points of the solution: - **Knowledge base construction**: In the pre - computation stage, construct a knowledge base containing trajectories. Each trajectory consists of the early steps as keys and the later steps as values. - **Inference process**: During the inference process, first generate the trajectories of the first few steps, then retrieve similar trajectories from the knowledge base, skip the intermediate steps, and directly continue the generation from the later steps of the retrieved trajectories. - **Zero - shot domain adaptation**: By extending the REDi framework, different - style images can be generated using a single - style knowledge base without reconstructing the knowledge base. ### Experimental results: - **Acceleration effect**: REDi can double the inference speed while maintaining the generation quality. - **Zero - shot domain adaptation**: REDi performs well in zero - shot domain adaptation and can generate images that conform to the specified style while maintaining the layout unchanged. ### Theoretical analysis: - **Assumptions**: Assume that the noise prediction model \(\epsilon_\theta(x_t, t)\) is \(L_0\)-Lipschitz, and the distance between the query and the nearest retrieved key is bounded. - **Theorem**: If \(d(x_k, \text{key}) \leq \epsilon\), then \(d(x_v, \text{val}) \leq e^{O(k - v)}\epsilon\). This shows that if the retrieved trajectories are close enough, then the retrieved \(x_v'\) can be a good substitute for the actual \(x_v\). Through these methods and analyses, REDi successfully solves the efficiency problem in the inference process of diffusion models while maintaining the generation quality.