Abstract:Quantifying the difference between probability distributions is crucial in machine learning. However, estimating statistical divergences from empirical samples is challenging due to unknown underlying distributions. This work proposes the representation Jensen-Shannon divergence (RJSD), a novel measure inspired by the traditional Jensen-Shannon divergence. Our approach embeds data into a reproducing kernel Hilbert space (RKHS), representing distributions through uncentered covariance operators. We then compute the Jensen-Shannon divergence between these operators, thereby establishing a proper divergence measure between probability distributions in the input space. We provide estimators based on kernel matrices and empirical covariance matrices using Fourier features. Theoretical analysis reveals that RJSD is a lower bound on the Jensen-Shannon divergence, enabling variational estimation. Additionally, we show that RJSD is a higher-order extension of the maximum mean discrepancy (MMD), providing a more sensitive measure of distributional differences. Our experimental results demonstrate RJSD's superiority in two-sample testing, distribution shift detection, and unsupervised domain adaptation, outperforming state-of-the-art techniques. RJSD's versatility and effectiveness make it a promising tool for machine learning research and applications.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to quantify the differences between probability distributions in machine learning. Specifically, since the underlying distribution of data is usually unknown in practical applications, it is challenging to estimate statistical divergences (such as Jensen - Shannon divergence) from empirical samples. This paper proposes a new metric method - Representation Jensen - Shannon Divergence (RJSD). This method embeds data into the Reproducing Kernel Hilbert Space (RKHS) and uses the uncentered covariance operator to represent the distribution, thereby establishing an effective probability distribution divergence metric in the input space. In addition, the paper also provides estimators based on the kernel matrix and the empirical covariance matrix and proves that RJSD is a lower bound of the traditional Jensen - Shannon divergence and can perform variational estimation. Experimental results show that RJSD outperforms the existing state - of - the - art techniques in two - sample tests, distribution shift detection, and unsupervised domain adaptation tasks. ### Formula and Symbol Explanation - **Jensen - Shannon Divergence**: \[ D_{\text{JS}}(P, Q) = H\left(\frac{P + Q}{2}\right)-\frac{1}{2}\left(H(P)+H(Q)\right) \] where \(H(P)\) represents the Shannon entropy of the probability distribution \(P\). - **Representation Jensen - Shannon Divergence (RJSD)**: \[ D_{\text{H}}^{\text{JS}}(P, Q) = D_{\text{JS}}(C_P, C_Q) = S\left(\frac{C_P + C_Q}{2}\right)-\frac{1}{2}\left(S(C_P)+S(C_Q)\right) \] where \(C_P\) and \(C_Q\) are the uncentered covariance operators of the probability distributions \(P\) and \(Q\) in RKHS respectively, and \(S(C)\) is the von Neumann entropy of the covariance operator. - **Maximum Mean Discrepancy (MMD)**: \[ \text{MMD}_\kappa^2(P, Q)=\|\mu_P-\mu_Q\|^2_H \] where \(\mu_P\) and \(\mu_Q\) are the mean embeddings of the probability distributions \(P\) and \(Q\) in RKHS respectively. - **Kernel Matrix**: \[ K_X=\left[\kappa(x_i, x_j)\right]_{i, j = 1}^n \] where \(\kappa\) is a kernel function and \(X = \{x_1, x_2,\ldots, x_n\}\) is a sample set. ### Main Contributions 1. **Extension of Jensen - Shannon Divergence**: Extend the traditional Jensen - Shannon divergence to infinite - dimensional covariance operators and define RJSD. 2. **Avoid Density Estimation**: Map data to RKHS and use uncentered covariance operators to represent distributions, avoiding direct estimation of the underlying density function. 3. **Estimators**: Propose a sample - based RJSD estimator and discuss its consistency results. 4. **Relationship with MMD**: Establish the connection between RJSD and MMD and prove that MMD can be regarded as a special case of RJSD. 5. **Variational Estimation**: Prove that RJSD is a lower bound of the classical Jensen - Shannon divergence, so that a variational estimator can be constructed. ### Experimental Results Experimental results show that RJSD performs excellently in two - sample tests, distribution shift detection, and unsupervised domain adaptation tasks and outperforms the existing state - of - the - art techniques.

The Representation Jensen-Shannon Divergence

The Representation Jensen-Shannon Divergence

R-divergence for Estimating Model-oriented Distribution Discrepancy

Jensen-variance distance measure: a unified framework for statistical and information measures

Measuring Generalized Divergence for Multiple Distributions with Application to Deep Clustering

Relative Density-Ratio Estimation for Robust Distribution Comparison

The Logarithmic Super Divergence and its use in Statistical Inference

Likelihood-free Model Choice for Simulator-based Models with the Jensen--Shannon Divergence

Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

Synthetic Tabular Data Validation: A Divergence-Based Approach

Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data.

Computing Marginal and Conditional Divergences between Decomposable Models with Applications

Robust Covariance Estimators Based on Information Divergences and Riemannian Manifold.

The Information Volume of Uncertain Information: (5) Divergence Measure

A New Method To Measure The Divergence In Evidential Sensor Data Fusion

Towards a robust frequency-domain analysis: Spectral Rényi divergence revisited

Semi-Nonparametric Estimation of Distribution Divergence in Non-Euclidean Spaces

Distances and Riemannian Metrics for Multivariate Spectral Densities

Learning Log-Determinant Divergences for Positive Definite Matrices

Divergence Measure of Belief Function and Its Application in Data Fusion