Can RBMs be trained with zero step contrastive divergence?

Charles K. Fisher
DOI: https://doi.org/10.48550/arXiv.2211.02174
2022-11-04
Abstract:Restricted Boltzmann Machines (RBMs) are probabilistic generative models that can be trained by maximum likelihood in principle, but are usually trained by an approximate algorithm called Contrastive Divergence (CD) in practice. In general, a CD-k algorithm estimates an average with respect to the model distribution using a sample obtained from a k-step Markov Chain Monte Carlo Algorithm (e.g., block Gibbs sampling) starting from some initial configuration. Choices of k typically vary from 1 to 100. This technical report explores if it's possible to leverage a simple approximate sampling algorithm with a modified version of CD in order to train an RBM with k=0. As usual, the method is illustrated on MNIST.
Machine Learning
What problem does this paper attempt to address?
This paper explores whether zero - step contrastive divergence (CD - 0) can be used to train restricted Boltzmann machines (RBMs). Traditionally, RBMs can be trained by the principle of maximum likelihood estimation, but in practice, an approximate algorithm - contrastive divergence (CD) - is usually used for training. The CD algorithm estimates the average value relative to the model distribution by obtaining samples from the initial configuration using a k - step Markov chain Monte Carlo (MCMC) algorithm (such as block Gibbs sampling). The common range of k values is from 1 to 100. However, this technical report explores a simplified approximate sampling algorithm, combined with a modified version of the CD method, and attempts to train RBMs without performing any MCMC steps, that is, using the case of k = 0. The author specifically studied RBMs with discrete visible and hidden units, and used Ising - type neurons (taking values of ±1) instead of Bernoulli units (taking values of 0, 1), but pointed out that these two model types are theoretically the same and can be converted to each other through a simple linear transformation. Through this method, the author hopes to understand whether a high - quality approximation of the model distribution is really required during the training process, or whether RBMs can be effectively trained even with a very rough approximation. To verify this hypothesis, the author conducted training experiments on the binary MNIST dataset using the CD - 0 method. The results show that although the quality of the samples generated by the RBMs trained with CD - 0 is not very high, the model can learn to create faithful reconstructions because the observed samples are encoded in the deep valleys of the energy landscape. In addition, the author proposed a simple algorithm called "belief generation" for generating approximate samples from RBMs, which may significantly accelerate the training speed and enable RBMs to be extended to previously intractable problem scales.