Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions

Frank Nielsen
2024-06-10
Abstract:Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of distance measurement between multivariate normal distribution data sets. In particular, when dealing with these data sets for downstream tasks (such as filtering, classification or clustering), how to define appropriate difference measures and connect their smooth paths. Specifically, the paper focuses on the following two main problems: 1. **Calculation of Fisher - Rao distance**: - The Fisher - Rao distance is the Riemannian geodesic distance induced by the Fisher information metric. It has no closed - form solution in the case of multivariate normal distribution, except for a few special cases (such as the one - dimensional case or the case where the mean/covariance is the same). Therefore, the paper first proposes a fast and robust method to approximate the Fisher - Rao distance between multivariate normal distributions and guarantees an accuracy of \(1+\epsilon\), where \(\epsilon > 0\). 2. **Introduction of new distance measures**: - The paper introduces a class of distance measures based on differential embeddings on high - dimensional symmetric positive - definite cone sub - manifolds. Specifically, by embedding the normal manifold into a higher - dimensional symmetric positive - definite cone and using the projected Hilbert distance to obtain the metric distance on the embedded sub - manifold. Then, pull this cone distance and its associated straight - line Hilbert cone geodesic back to between normal distributions to obtain the distance and smooth path. - This new distance measure (called pullback Hilbert cone distance) is more computationally lightweight because it only needs to calculate the minimum and maximum eigenvalues of the matrix. ### Main contributions of the paper - **Approximation method of Fisher - Rao distance**: Provides a fast and accuracy - guaranteed method to approximate the Fisher - Rao distance. - **Introduction of new distance measures**: Proposes the pullback Hilbert cone distance, which is simple to calculate and has good geometric properties. - **Application examples**: Shows how to use these distances and paths in clustering tasks, thereby simplifying and quantifying Gaussian mixture models (GMMs). ### Formula summary 1. **Fisher - Rao distance**: \[ \rho_{FR}(N_0, N_1)=\int_{0}^{1}ds_{Fisher}(\gamma_{FR}(N_0, N_1; t))dt \] where \(ds_{Fisher}\) is the Fisher - Rao geodesic element. 2. **Pullback Hilbert cone distance**: - Embedding mapping \(f_a(N(\mu,\Sigma))=\begin{bmatrix}\Sigma + a\mu\mu^{\top}&a\mu\\a\mu^{\top}&a\end{bmatrix}\) - Projected Hilbert distance: \[ \rho_P(N_0, N_1)=\sqrt{\sum_{i = 1}^{d + 1}\log^{2}\lambda_i(N_0^{-1/2}N_1N_0^{-1/2})} \] Through these methods, the paper provides effective tools and theoretical support for the processing of multivariate normal distribution data sets.