Minimizing Convex Functionals over Space of Probability Measures via KL Divergence Gradient Flow

Rentian Yao,Linjun Huang,Yun Yang
2023-11-02
Abstract:Motivated by the computation of the non-parametric maximum likelihood estimator (NPMLE) and the Bayesian posterior in statistics, this paper explores the problem of convex optimization over the space of all probability distributions. We introduce an implicit scheme, called the implicit KL proximal descent (IKLPD) algorithm, for discretizing a continuous-time gradient flow relative to the Kullback-Leibler divergence for minimizing a convex target functional. We show that IKLPD converges to a global optimum at a polynomial rate from any initialization; moreover, if the objective functional is strongly convex relative to the KL divergence, for example, when the target functional itself is a KL divergence as in the context of Bayesian posterior computation, IKLPD exhibits globally exponential convergence. Computationally, we propose a numerical method based on normalizing flow to realize IKLPD. Conversely, our numerical method can also be viewed as a new approach that sequentially trains a normalizing flow for minimizing a convex functional with a strong theoretical guarantee.
Statistics Theory
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to minimize convex functionals in the probability measure space. Specifically, the author explores minimizing an L2 - convex objective functional \(F\) over the space \(P(\Theta)\) of all probability distributions and proposes a new implicit discretization scheme - the Implicit KL Proximal Descent (IKLPD) algorithm. This algorithm is implemented based on the discretization of the Kullback - Leibler (KL) divergence gradient flow. ### Main Problems 1. **Non - parametric Maximum Likelihood Estimation (NPMLE)**: - This problem occurs when estimating the mixture distribution of a mixture model and when using the empirical Bayes method to solve compound decision problems. - The goal is to minimize the average negative log - likelihood functional \(L_n(\rho)\), where \(\rho\) is the unknown mixture distribution. - The formula is: \[ \hat{P}_n=\arg\min_{\rho\in P(\Theta)}L_n(\rho),\quad\text{with}\quad L_n(\rho):=\frac{1}{n}\sum_{i = 1}^n-\log\left(\int_\Theta p(X_i|\theta)d\rho(\theta)\right) \] - \(L_n\) is clearly L2 - convex on \(P(\Theta)\), but is usually not displacement - convex. 2. **Bayesian Posterior Sampling**: - In Bayesian statistics, the core problem is to sample from the posterior distribution of the unknown parameters to estimate the parameters and construct the corresponding confidence intervals. - The posterior distribution can be identified by minimizing the KL - divergence functional \(D_{\text{KL}}(\cdot\|\pi_n)\). - The formula is: \[ \pi_n=\arg\min_{\rho\in P(\Theta)}\int V_n(\theta)d\rho(\theta)+\int\rho\log\rho \] where \(V_n(\theta)=-\log\pi(\theta)-\sum_{i = 1}^n\log p(X_i|\theta)\). ### Paper Contributions - Proposed the Implicit KL Proximal Descent (IKLPD) algorithm for discretizing the continuous - time gradient flow related to the KL - divergence to minimize the general L2 - convex functional \(F\). - Proved that under only the L2 - convexity condition, IKLPD can converge to the global optimal solution from any initialization point, and if \(F\) is strongly convex with respect to the KL - divergence, then IKLPD exhibits global exponential convergence. - Proposed a numerical method based on the normalization flow to implement IKLPD, which can train the normalization flow layer by layer to minimize the convex functional \(F\). - Analyzed the convergence of the inexact IKLPD with non - zero numerical errors and the stochastic version of IKLPD. Through these methods, the paper provides a new, theoretically - guaranteed method to minimize convex functionals in the probability measure space, especially performing well when dealing with NPMLE and Bayesian posterior calculations.