Abstract:The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. During learning such algorithms change model parameters to increase the variational lower bound. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For standard machine learning models with one set of latents and one set of observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distribution. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary point (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many well-known generative models. As concrete examples we discuss Sigmoid Belief Networks, probabilistic PCA and (Gaussian and non-Gaussian) mixture models. The result also applies for standard (Gaussian) variational autoencoders, a special case that has been shown previously (Damm et al., 2023). The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family, and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
What problem does this paper attempt to address?
### The problems the paper attempts to solve
The paper aims to solve the problem that the variational lower bound (Variational Lower Bound, also known as ELBO or free energy) converges to the sum of entropies during the learning process of generative models. Specifically, the author attempts to prove that for a large class of generative models, at all stable points during the learning process, the variational lower bound (ELBO) is equal to the sum of three entropies:
1. The average entropy of the variational distribution \( H[q^{(n)}_\Phi(\vec{z})] \)
2. The negative entropy of the model prior distribution \( -H[p_\Theta(\vec{z})] \)
3. The expected negative entropy of the observation distribution \( -\mathbb{E}_{q^{(n)}_\Phi(\vec{z})}[H[p_\Theta(\vec{x}|\vec{z})]] \)
These results are applicable to generative models under realistic conditions, including a finite number of data points, any stable points (including saddle points), and any well - behaved family of variational distributions. The author proves this conclusion through rigorous mathematical derivations and discusses several specific generative models, such as Sigmoid Belief Networks (SBN), Probabilistic Principal Component Analysis (PCA), and mixture models, etc.
### Formula representation
The key formulas in the paper are as follows:
1. **Definition of the variational lower bound**:
\[
F(\Phi, \Theta) = \frac{1}{N} \sum_{n} \int q^{(n)}_\Phi(\vec{z}) \log \left( \frac{p_\Theta(\vec{x}^{(n)}|\vec{z}) p_\Theta(\vec{z})}{q^{(n)}_\Phi(\vec{z})} \right) d\vec{z}
\]
This can be decomposed into:
\[
F(\Phi, \Theta) = \frac{1}{N} \sum_{n} \int q^{(n)}_\Phi(\vec{z}) \log(p_\Theta(\vec{x}^{(n)}|\vec{z})) d\vec{z} - \frac{1}{N} \sum_{n} D_{KL}[q^{(n)}_\Phi(\vec{z}) \| p_\Theta(\vec{z})]
\]
2. **Form of the sum of entropies**:
\[
F(\Phi, \Theta) = \frac{1}{N} \sum_{n} H[q^{(n)}_\Phi(\vec{z})] - H[p_\Theta(\vec{z})] - \frac{1}{N} \sum_{n} \mathbb{E}_{q^{(n)}_\Phi(\vec{z})} \left[ H[p_\Theta(\vec{x}|\vec{z})] \right]
\]
### Main contributions
The main contribution of the paper lies in proving that during the learning process of generative models, the variational lower bound can be decomposed into the sum of the above three entropies at all stable points. This result is not only theoretically significant but also provides a new perspective for practical applications, such as analyzing the optimization landscape and the posterior collapse phenomenon in Variational Auto - Encoders (VAE).
### Conclusion
Through this research, the author provides a new theoretical framework for understanding the variational lower bound of generative models, which helps to better understand and optimize the learning process of generative models.