Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

Tom Huix,Anna Korba,Alain Durmus,Eric Moulines
2024-06-10
Abstract:Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.
Machine Learning
What problem does this paper attempt to address?
The paper aims to address the theoretical guarantees of Variational Inference (VI) in non-Gaussian distribution scenarios. Specifically, the paper focuses on the application of the Mixture of Gaussians with fixed variance in variational inference and presents the following main contributions: 1. **Optimization Guarantees**: - By studying a simplified setting of variational inference, where the Gaussian mixture components have the same diagonal covariance matrix and equal weights, the paper proposes a mollified relative entropy. This mollified relative entropy can be viewed as the reverse KL divergence between the target distribution and the distribution after convolution with a Gaussian kernel. - The researchers prove that for this mollified relative entropy, under certain smoothness and moment condition assumptions, the Wasserstein gradient descent method can ensure that the objective function decreases with each iteration. 2. **Approximation Guarantees**: - The paper further explores the ability of a finite mixture of Gaussians to approximate the posterior distribution in the sense of reverse KL divergence. Specifically, for any target distribution, as the number of Gaussian mixture components increases, the approximation error not only decreases but also converges to zero. - A non-asymptotic rate is provided, quantifying the approximation quality of the family of Gaussian mixtures with fixed variance and equal weights in the sense of reverse KL divergence. Through these theoretical analyses, the paper provides important theoretical support for variational inference in non-Gaussian scenarios, especially in cases where traditional single Gaussian models struggle to effectively approximate multimodal data. This research offers valuable insights for understanding and improving the performance of variational inference methods in practical applications.