Abstract:Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM. At each noise level, the two models are jointly estimated within a cooperative training framework: samples from the initializer serve as starting points that are refined by a few MCMC sampling steps from the EBM. The EBM is then optimized by maximizing recovery likelihood, while the initializer model is optimized by learning from the difference between the refined samples and the initial samples. In addition, we made several practical designs for EBM training to further improve the sample quality. Combining these advances, our approach significantly boost the generation performance compared to existing EBM methods on CIFAR-10 and ImageNet datasets. We also demonstrate the effectiveness of our models for several downstream tasks, including classifier-free guided generation, compositional generation, image inpainting and out-of-distribution detection.

What problem does this paper attempt to address?

This paper proposes a new method called Cooperative Diffusion Recovery Likelihood (CDRL) to improve the training and sampling efficiency of Energy-Based Models (EBMs) on high-dimensional data. EBMs have demonstrated flexibility and practicality in scenarios such as image generation and graph generation, but suffer from training difficulties and lower sampling quality compared to diffusion models and Generative Adversarial Networks (GANs). The paper points out that although recent Diffusion Recovery Likelihood (DRL) frameworks have improved EBM training, there is still room for improvement in sample quality and sampling speed. CDRL addresses this issue through a cooperative training strategy, estimating a series of noisy versions of EBMs defined on the data along with an initialization model paired with each EBM. At each noise level, the initializer and EBM are updated together, with the initializer predicting samples at the current noise level from high-noise versions, and then refining them through Markov Chain Monte Carlo (MCMC) sampling steps of the EBM. The EBM is optimized by maximizing the recovery likelihood, while the initializer is optimized by learning the differences between refined samples and initial samples. The paper also introduces practical designs for noise scheduling, MCMC sampling, and noise variance reduction to further improve sample quality. Experiments show that CDRL significantly enhances the generative performance of EBM methods on CIFAR-10 and ImageNet datasets, and demonstrates utility in downstream tasks such as conditional generation, composite generation, image restoration, and anomaly detection. Moreover, CDRL is compatible with Classifier-Free Guidance (CFG), further enhancing the performance of conditional generation.

Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

Training Energy-Based Models with Diffusion Contrastive Divergences

Persistently Trained, Diffusion-assisted Energy-based Models

Improving Adversarial Energy-Based Model via Diffusion Process

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

Energy-Based Diffusion Language Models for Text Generation

Generalized Contrastive Divergence: Joint Training of Energy-Based Model and Diffusion Model through Inverse Reinforcement Learning

Learning Energy-Based Models in High-Dimensional Spaces with Multiscale Denoising-Score Matching

Learning Energy-Based Models in High-Dimensional Spaces with Multi-scale Denoising Score Matching

Guiding Energy-based Models via Contrastive Latent Variables

Learning Energy-based Model via Dual-MCMC Teaching

Efficient Training of Energy-Based Models Using Jarzynski Equality

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

Learning Latent Space Hierarchical EBM Diffusion Models

Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models

Efficient training of energy-based models using Jarzynski equality *

Classification Diffusion Models: Revitalizing Density Ratio Estimation

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures