Abstract:Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.

What problem does this paper attempt to address?

The paper aims to address the theoretical guarantees of Variational Inference (VI) in non-Gaussian distribution scenarios. Specifically, the paper focuses on the application of the Mixture of Gaussians with fixed variance in variational inference and presents the following main contributions: 1. **Optimization Guarantees**: - By studying a simplified setting of variational inference, where the Gaussian mixture components have the same diagonal covariance matrix and equal weights, the paper proposes a mollified relative entropy. This mollified relative entropy can be viewed as the reverse KL divergence between the target distribution and the distribution after convolution with a Gaussian kernel. - The researchers prove that for this mollified relative entropy, under certain smoothness and moment condition assumptions, the Wasserstein gradient descent method can ensure that the objective function decreases with each iteration. 2. **Approximation Guarantees**: - The paper further explores the ability of a finite mixture of Gaussians to approximate the posterior distribution in the sense of reverse KL divergence. Specifically, for any target distribution, as the number of Gaussian mixture components increases, the approximation error not only decreases but also converges to zero. - A non-asymptotic rate is provided, quantifying the approximation quality of the family of Gaussian mixtures with fixed variance and equal weights in the sense of reverse KL divergence. Through these theoretical analyses, the paper provides important theoretical support for variational inference in non-Gaussian scenarios, especially in cases where traditional single Gaussian models struggle to effectively approximate multimodal data. This research offers valuable insights for understanding and improving the performance of variational inference methods in practical applications.

Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

Bayesian Estimation of the Von-Mises Fisher Mixture Model with Variational Inference

On the Approximation Accuracy of Gaussian Variational Inference

Variational Inference: A Review for Statisticians

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

Flexible and Efficient Inference with Particles for the Variational Gaussian Approximation

On the Convergence of Extended Variational Inference for Non-Gaussian Statistical Models.

A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Variational inference for Dirichlet process mixtures

A Deterministic Global Optimization Method for Variational Inference

$α$-Variational Inference with Statistical Guarantees

MAP approximation to the variational Bayes Gaussian mixture model and application

On the properties of variational approximations of Gibbs posteriors

Amortized Variational Inference for Deep Gaussian Processes

Variational Bayesian Learning for Parameter Estimation of Mixture of Gaussians

Extended Variational Inference for Gamma Mixture Model in Positive Vectors Modeling

Variational Bayesian inference with stochastic search

Insights into Multiple/Single Lower Bound Approximation for Extended Variational Inference in Non-Gaussian Structured Data Modeling

Variational inference: uncertainty quantification in additive models

A Particle Algorithm for Mean-Field Variational Inference

Statistical and Computational Trade-offs in Variational Inference: A Case Study in Inferential Model Selection