Kernel Stein Discrepancy on Lie Groups: Theory and Applications

Xiaoda Qu,Xiran Fan,Baba C. Vemuri
2024-09-18
Abstract:Distributional approximation is a fundamental problem in machine learning with numerous applications across all fields of science and engineering and beyond. The key challenge in most approximation methods is the need to tackle the intractable normalization constant pertaining to the parametrized distributions used to model the data. In this paper, we present a novel Stein operator on Lie groups leading to a kernel Stein discrepancy (KSD) which is a normalization-free loss function. We present several theoretical results characterizing the properties of this new KSD on Lie groups and its minimizers namely, the minimum KSD estimator (MKSDE). Proof of several properties of MKSDE are presented, including strong consistency, CLT and a closed form of the MKSDE for the von Mises-Fisher distribution on SO(N). Finally, we present experimental evidence depicting advantages of minimizing KSD over maximum likelihood estimation.
Statistics Theory,Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the distribution approximation problem on Lie groups. Specifically, the paper focuses on how to approximate probability distributions on Lie groups without normalization constants. This challenge mainly stems from the fact that many distributions on Lie groups have intractable normalization constants, especially when dealing with manifold - valued random variables (such as rotation matrices, orthogonal matrices, etc.). The existence of such normalization constants makes the traditional maximum likelihood estimation (MLE) method complicated and inefficient in practical applications. ### Main contributions of the paper 1. **Proposing a new Stein operator**: - The authors propose a new Stein operator based on the Lie group structure. This operator can be used to define the kernel Stein discrepancy (KSD), thus avoiding the calculation of normalization constants. - This new Stein operator is applicable to all Lie groups and has good theoretical properties, such as strong consistency, the central limit theorem (CLT), etc. 2. **Minimum Kernel Stein Discrepancy Estimator (MKSDE)**: - The paper defines the minimum kernel Stein discrepancy estimator (MKSDE) and proves its asymptotic properties in different situations through theoretical analysis. - MKSDE is proven to be more accurate in parameter estimation than the traditional MLE method, especially when dealing with distributions with intractable normalization constants. 3. **Theoretical results**: - The paper provides multiple theoretical results for KSD and its minimizer (MKSDE), including strong consistency, CLT, and closed - form solutions for specific distributions (such as von Mises - Fisher distribution, exponential distribution, and Riemannian normal distribution). 4. **Experimental verification**: - Through experiments, the paper shows the advantages of MKSDE in parameter estimation. Especially when dealing with data on Lie groups such as rotation matrices, MKSDE can provide more accurate parameter estimates. ### Formula presentation - **Stein operator**: \[ A_p: f \mapsto \sum_{l = 1}^d \left[D_l f_l + f_l D_l \log p+ f_l D_l \Delta\right], \quad f \in H_k^d \] where \(D_l\) is the left - invariant vector field and \(\Delta\) is the modular function. - **Kernel Stein Discrepancy (KSD)**: \[ \text{KSD}(p, q)=\sup \left\{\mathbb{E}_q[A_p f]: f \in H_k^d, \|f\|_{H_k^d}\leq 1\right\} \] - **Closed - form of KSD**: \[ \text{KSD}^2(p, q)=\int_G\int_G k_p(x, y)q(x)q(y)\mu(dx)\mu(dy) \] where \(k_p(x, y)=\sum_{l = 1}^d \langle A_l^p k_x, A_l^p k_y \rangle_{H_k}\). ### Conclusion The paper successfully solves the normalization constant problem in distribution approximation on Lie groups by proposing a new Stein operator and the corresponding KSD. This method not only has good theoretical properties but also shows superior performance in practical applications, especially when dealing with data on Lie groups such as rotation matrices.