Abstract:Energy-based models (EBMs) offer a flexible framework for probabilistic modelling across various data domains. However, training EBMs on data in discrete or mixed state spaces poses significant challenges due to the lack of robust and fast sampling methods. In this work, we propose to train discrete EBMs with Energy Discrepancy, a loss function which only requires the evaluation of the energy function at data points and their perturbed counterparts, thus eliminating the need for Markov chain Monte Carlo. We introduce perturbations of the data distribution by simulating a diffusion process on the discrete state space endowed with a graph structure. This allows us to inform the choice of perturbation from the structure of the modelled discrete variable, while the continuous time parameter enables fine-grained control of the perturbation. Empirically, we demonstrate the efficacy of the proposed approaches in a wide range of applications, including the estimation of discrete densities with non-binary vocabulary and binary image modelling. Finally, we train EBMs on tabular data sets with applications in synthetic data generation and calibrated classification.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenges encountered when training Energy - Based Models (EBMs) in discrete and mixed data spaces. Specifically, the main problems that EBMs face in training in discrete or mixed state spaces include: 1. **Lack of effective sampling methods**: The traditional Contrastive Divergence (CD) algorithm relies on Markov Chain Monte Carlo (MCMC) techniques to approximate the log - likelihood gradient. However, in discrete spaces, MCMC methods are difficult to achieve fast and accurate sampling, resulting in CD lacking theoretical guarantees and producing biased energy landscape estimates. 2. **Processing of discrete and mixed data**: Existing EBM research mainly focuses on continuous data, and relatively little attention has been paid to the processing of discrete and mixed data types, especially when these data have additional structures (such as graph structures, periodicity, or ordered categories). To solve these problems, the authors propose the following methods: - **Energy Discrepancy (ED)**: This is a new contrastive loss function that generates positive and negative samples by perturbing the data distribution, thus avoiding the need for MCMC sampling. ED only needs to evaluate the values of the energy function on data points and their perturbed versions, simplifying the training process. - **Discrete diffusion process**: By defining the Heat Equation on the discrete state space, a systematic method is introduced to extend the energy difference to the discrete space. This method allows the perturbation method to be selected according to the structure of the data and enables fine - grained control of the perturbation through continuous - time parameters. - **Application in mixed state spaces**: The above methods are generalized to mixed state spaces, and for the first time, a robust EBM training method suitable for tabular data sets is proposed. This provides new tools for downstream tasks such as synthetic data generation and calibration classification. ### Main contributions of the paper 1. **Explore the discrete diffusion process**: By defining the Heat Equation on the discrete state space, the influence of geometric structures and time parameters on diffusion is studied. 2. **Extend energy differences to discrete spaces**: Based on the discrete diffusion process, the energy difference is systematically applied to the discrete space, providing a training method without MCMC and reducing the need for parameter tuning. 3. **Handle mixed state spaces**: The method is extended to mixed state spaces, especially tabular data sets, establishing the first robust EBM training method and demonstrating its potential in synthetic data generation and calibration prediction. Through these contributions, the paper provides a new and effective tool for generative modeling of discrete and mixed data.

Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces

Persistently Trained, Diffusion-assisted Energy-based Models

Efficient Training of Energy-Based Models Using Jarzynski Equality

Particle Dynamics for Learning EBMs

Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions

Energy-Based Models with Applications to Speech and Language Processing

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

Learning Energy-Based Models in High-Dimensional Spaces with Multiscale Denoising-Score Matching

Efficient training of energy-based models using Jarzynski equality *

MCMC Should Mix: Learning Energy-Based Model with Neural Transport Latent Space MCMC

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

Training Energy-Based Models with Diffusion Contrastive Divergences

Improving Adversarial Energy-Based Model via Diffusion Process

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Learning Proposals for Practical Energy-Based Regression

MCMC Should Mix: Learning Energy-Based Model with Flow-Based Backbone

GraphEBM: Molecular Graph Generation with Energy-Based Models

On Feature Diversity in Energy-based Models

STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models

Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler

Bounds All Around: Training Energy-Based Models with Bidirectional Bounds