Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs

Kaiwen Zheng,Cheng Lu,Jianfei Chen,Jun Zhu

2024-04-06

Abstract:Diffusion models have exhibited excellent performance in various domains. The probability flow ordinary differential equation (ODE) of diffusion models (i.e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation. However, the likelihood estimation results by diffusion ODEs are still far from those of the state-of-the-art likelihood-based generative models. In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives. For training, we propose velocity parameterization and explore variance reduction techniques for faster convergence. We also derive an error-bounded high-order flow matching objective for finetuning, which improves the ODE likelihood and smooths its trajectory. For evaluation, we propose a novel training-free truncated-normal dequantization to fill the training-evaluation gap commonly existing in diffusion ODEs. Building upon these techniques, we achieve state-of-the-art likelihood estimation results on image datasets (2.56 on CIFAR-10, 3.43/3.69 on ImageNet-32) without variational dequantization or data augmentation, and 2.42 on CIFAR-10 with data augmentation. Code is available at \url{<a class="link-external link-https" href="https://github.com/thu-ml/i-DODE" rel="external noopener nofollow">this https URL</a>}.

Machine Learning

What problem does this paper attempt to address?

This paper addresses the issues with maximum likelihood estimation (MLE) for diffusion ordinary differential equations (ODEs) and proposes improved techniques to enhance its performance on image datasets. The research found that existing methods suffer from suboptimal likelihood estimation in both training and evaluation, leading to a gap compared to state-of-the-art likelihood-based generative models. To address this, the paper introduces an untrained truncated normal quantization method to reduce the training-evaluation gap and incorporates a weighted likelihood estimator for tighter bounds. On the training side, convergence is accelerated through speed parameterization and variance reduction techniques, while error-bounded high-order flow matching objectives are designed for fine-tuning and improving ODE trajectories. These improvements enable the model to achieve optimal likelihood estimation results on CIFAR-10 and ImageNet-32 without the need for data augmentation or variational quantization.

Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs

Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching

Maximum Likelihood Training of Implicit Nonlinear Diffusion Models

Physics Informed Distillation for Diffusion Models

Exploring the Optimal Choice for Generative Processes in Diffusion Models: Ordinary vs Stochastic Differential Equations

Learning Quantized Adaptive Conditions for Diffusion Models

The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

A Geometric Perspective on Diffusion Models

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Observation-Guided Diffusion Probabilistic Models

Statistical algorithms for low-frequency diffusion data: A PDE approach

Data Augmentation for Diffusions

AdjointDEIS: Efficient Gradients for Diffusion Models

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Improved Convergence Rate for Diffusion Probabilistic Models

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Distribution learning via neural differential equations: a nonparametric statistical perspective

Where to Diffuse, How to Diffuse, and How to Get Back: Automated Learning for Multivariate Diffusions