Abstract:This paper studies the problem of parameter learning in probabilistic graphical models having latent variables, where the standard approach is the expectation maximization algorithm alternating expectation (E) and maximization (M) steps. However, both E and M steps are computationally intractable for high dimensional data, while the substitution of one step to a faster surrogate for combating against intractability can often cause failure in convergence. We propose a new learning algorithm which is computationally efficient and provably ensures convergence to a correct optimum. Its key idea is to run only a few cycles of Markov Chains (MC) in both E and M steps. Such an idea of running incomplete MC has been well studied only for M step in the literature, called Contrastive Divergence (CD) learning. While such known CD-based schemes find approximated gradients of the log-likelihood via the mean-field approach in E step, our proposed algorithm does exact ones via MC algorithms in both steps due to the multi-time-scale stochastic approximation theory. Despite its theoretical guarantee in convergence, the proposed scheme might suffer from the slow mixing of MC in E step. To tackle it, we also propose a hybrid approach applying both mean-field and MC approximation in E step, where the hybrid approach outperforms the bare mean-field CD scheme in our experiments on real-world datasets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the computational complexity problem encountered when performing parameter learning in probabilistic graphical models with latent variables. Specifically: 1. **Standard methods and their limitations**: The traditional parameter - learning method is to use the Expectation - Maximization (EM) algorithm, which alternately executes the Expectation (E) step and the Maximization (M) step. However, for high - dimensional data, both of these steps are computationally intractable (i.e., high computational complexity), and using faster alternative methods to combat this complexity may lead to the algorithm not converging. 2. **Deficiencies of existing solutions**: The existing Contrastive Divergence (CD) learning method uses the mean - field approximation in the E - step and runs a small number of Markov Chain (MC) cycles in the M - step. Although this method can improve computational efficiency to a certain extent, it can only find approximate solutions and optimizes the lower bound of the log - likelihood rather than the actual log - likelihood. 3. **The new method proposed in the paper**: To solve the above problems, this paper proposes a new learning algorithm - Adiabatic Persistent Contrastive Divergence (APCD) learning. The key idea of this algorithm is to run a small number of Markov Chain cycles in both the E and M steps, rather than using the mean - field approximation in the E - step as in the traditional method. In this way, APCD can directly optimize the actual log - likelihood function, rather than its approximation or lower bound. 4. **Theoretical guarantees and practical performance**: The paper proves that APCD, supported by the multi - time - scale stochastic approximation theory, can ensure convergence to the actual local optimal solution of the log - likelihood. In addition, to deal with the slow - mixing problem that may occur in the E - step, the author also designs a mixing scheme that combines the advantages of the mean - field and Markov Chain, thus showing better performance than pure mean - field CD in experiments. In summary, this paper aims to solve the computational complexity and convergence problems of parameter learning in probabilistic graphical models and proposes a new algorithm that is both efficient and can ensure convergence to the correct optimal solution.

Adiabatic Persistent Contrastive Divergence Learning

Convergence of Contrastive Divergence Algorithm in Exponential Family

Generalized Contrastive Divergence: Joint Training of Energy-Based Model and Diffusion Model through Inverse Reinforcement Learning

A Neighbourhood-Based Stopping Criterion for Contrastive Divergence Learning

Training Energy-Based Models with Diffusion Contrastive Divergences

Contrastive learning of strong-mixing continuous-time stochastic processes

A Cyclic Contrastive Divergence Learning Algorithm for High-Order RBMs

Why (and When and How) Contrastive Divergence Works

Learning Multi-Layer Latent Variable Model with Short Run MCMC Inference Dynamics

Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference

Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation

Channel-aware Contrastive Conditional Diffusion for Multivariate Probabilistic Time Series Forecasting

Probabilistic Contrastive Learning for Domain Adaptation

Gradient Descent Temporal Difference-Difference Learning

A Convergent ADMM Framework for Efficient Neural Network Training

Understanding Contrastive Learning via Distributionally Robust Optimization

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit

Variational Hierarchical Mixtures for Probabilistic Learning of Inverse Dynamics