Adiabatic Persistent Contrastive Divergence Learning

Hyeryung Jang,Hyungwon Choi,Yung Yi,Jinwoo Shin
DOI: https://doi.org/10.48550/arXiv.1605.08174
2017-02-14
Abstract:This paper studies the problem of parameter learning in probabilistic graphical models having latent variables, where the standard approach is the expectation maximization algorithm alternating expectation (E) and maximization (M) steps. However, both E and M steps are computationally intractable for high dimensional data, while the substitution of one step to a faster surrogate for combating against intractability can often cause failure in convergence. We propose a new learning algorithm which is computationally efficient and provably ensures convergence to a correct optimum. Its key idea is to run only a few cycles of Markov Chains (MC) in both E and M steps. Such an idea of running incomplete MC has been well studied only for M step in the literature, called Contrastive Divergence (CD) learning. While such known CD-based schemes find approximated gradients of the log-likelihood via the mean-field approach in E step, our proposed algorithm does exact ones via MC algorithms in both steps due to the multi-time-scale stochastic approximation theory. Despite its theoretical guarantee in convergence, the proposed scheme might suffer from the slow mixing of MC in E step. To tackle it, we also propose a hybrid approach applying both mean-field and MC approximation in E step, where the hybrid approach outperforms the bare mean-field CD scheme in our experiments on real-world datasets.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the computational complexity problem encountered when performing parameter learning in probabilistic graphical models with latent variables. Specifically: 1. **Standard methods and their limitations**: The traditional parameter - learning method is to use the Expectation - Maximization (EM) algorithm, which alternately executes the Expectation (E) step and the Maximization (M) step. However, for high - dimensional data, both of these steps are computationally intractable (i.e., high computational complexity), and using faster alternative methods to combat this complexity may lead to the algorithm not converging. 2. **Deficiencies of existing solutions**: The existing Contrastive Divergence (CD) learning method uses the mean - field approximation in the E - step and runs a small number of Markov Chain (MC) cycles in the M - step. Although this method can improve computational efficiency to a certain extent, it can only find approximate solutions and optimizes the lower bound of the log - likelihood rather than the actual log - likelihood. 3. **The new method proposed in the paper**: To solve the above problems, this paper proposes a new learning algorithm - Adiabatic Persistent Contrastive Divergence (APCD) learning. The key idea of this algorithm is to run a small number of Markov Chain cycles in both the E and M steps, rather than using the mean - field approximation in the E - step as in the traditional method. In this way, APCD can directly optimize the actual log - likelihood function, rather than its approximation or lower bound. 4. **Theoretical guarantees and practical performance**: The paper proves that APCD, supported by the multi - time - scale stochastic approximation theory, can ensure convergence to the actual local optimal solution of the log - likelihood. In addition, to deal with the slow - mixing problem that may occur in the E - step, the author also designs a mixing scheme that combines the advantages of the mean - field and Markov Chain, thus showing better performance than pure mean - field CD in experiments. In summary, this paper aims to solve the computational complexity and convergence problems of parameter learning in probabilistic graphical models and proposes a new algorithm that is both efficient and can ensure convergence to the correct optimal solution.