Abstract:Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the computational challenges encountered in the training process of energy - based models (EBMs), especially regarding the density estimation problem of multimodal data distributions. Specifically: 1. **Computational Complexity**: Energy models lack an explicit normalization constant \( Z_\theta \), which makes the likelihood calculation of the model infeasible in practice. Due to the difficulty in calculating the normalization constant, it is difficult to directly use the maximum - likelihood method to train EBMs. 2. **Sampling Difficulty**: Although traditional Monte Carlo Markov Chain (MCMC) algorithms and variational inference (VI) techniques can approximately estimate the likelihood gradient, they are not effective in dealing with multimodal distributions, especially in accurately representing the relative importance between different modes. 3. **Statistical Accuracy**: Existing training methods often overlook the statistical accuracy of the estimated density, especially in determining the relative importance of different categories in the dataset. This is particularly important for applications that require accurate modeling of multimodal data. To solve these problems, the paper proposes a new maximum - likelihood training algorithm, which combines normalizing flows (NF) to assist in the training of EBMs. The specific methods are as follows: - **Combining Normalizing Flows**: By fitting a normalizing flow to the EBM during the training process, the normalizing flow can provide accurate gradient estimates, thereby achieving fast and accurate sampling. - **Adaptive Flow Sampling**: Use a calibrated MCMC sampler, which utilizes the independent proposals provided by the normalizing flow and can quickly mix between different modes, thereby improving training efficiency and statistical accuracy. This method not only solves the computational and sampling problems in traditional methods but also improves the performance of the model on multimodal data and ensures accurate estimation of the relative importance of different modes.

Balanced Training of Energy-Based Models with Adaptive Flow Sampling

Iterated Energy-based Flow Matching for Sampling from Boltzmann Densities

Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows

MCMC Should Mix: Learning Energy-Based Model with Flow-Based Backbone

STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models

Non-Generative Energy Based Models

MCMC Should Mix: Learning Energy-Based Model with Neural Transport Latent Space MCMC

Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler

Stochastic Normalizing Flows

Normalizing flow sampling with Langevin dynamics in the latent space

Dynamical Sampling With Langevin Normalization Flows

Efficient Training of Energy-Based Models Using Jarzynski Equality

Learned harmonic mean estimation of the marginal likelihood with normalizing flows

Bounds All Around: Training Energy-Based Models with Bidirectional Bounds

Flow Annealed Importance Sampling Bootstrap meets Differentiable Particle Physics

Efficient training of energy-based models using Jarzynski equality *

Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces

Entropy-based Training Methods for Scalable Neural Implicit Samplers

Entropy-based Training Methods for Scalable Neural Implicit Sampler

Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty Modeling

Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows