Improving the Training of Rectified Flows

Sangyun Lee,Zinan Lin,Giulia Fanti
2024-10-09
Abstract:Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with \emph{knowledge distillation} methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training of rectified flows, including a U-shaped timestep distribution and LPIPS-Huber premetric. With these techniques, we improve the FID of the previous 2-rectified flow by up to 75\% in the 1 NFE setting on CIFAR-10. On ImageNet 64$\times$64, our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two-step settings and rivals the performance of improved consistency training (iCT) in FID. Code is available at <a class="link-external link-https" href="https://github.com/sangyun884/rfpp" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to improve the training method of **Rectified Flows** so that it can compete with knowledge distillation methods under low Number of Function Evaluations (NFEs) settings. Specifically: 1. **Problem Background**: - **Diffusion Models** perform excellently in image and video generation but require expensive numerical integration of the generated Ordinary Differential Equations (ODEs). - **Rectified Flows** is a method that iteratively learns smooth ODE paths. Although it requires fewer NFEs than traditional diffusion models, it still underperforms compared to knowledge distillation methods in low NFEs settings. 2. **Main Findings**: - In practical scenarios, a single iteration of rectified flows is sufficient to learn almost straight-line trajectories, making multiple iterations unnecessary. - The paper proposes several improvement techniques to optimize single-iteration training, including U-shaped time step distribution and LPIPS-Huber pre-metric. 3. **Improvement Goals**: - Enhance the performance of rectified flows under low NFEs settings, making it comparable to state-of-the-art knowledge distillation methods such as consistency distillation and progressive distillation. Through these improvements, the paper significantly boosts the performance of rectified flows on the CIFAR-10 and ImageNet datasets, especially under low NFEs settings, achieving notable advantages over existing methods.