Abstract:The density ratio of two probability distributions is one of the fundamental tools in mathematical and computational statistics and machine learning, and it has a variety of known applications. Therefore, density ratio estimation from finite samples is a very important task, but it is known to be unstable when the distributions are distant from each other. One approach to address this problem is density ratio estimation using incremental mixtures of the two distributions. We geometrically reinterpret existing methods for density ratio estimation based on incremental mixtures. We show that these methods can be regarded as iterating on the Riemannian manifold along a particular curve between the two probability distributions. Making use of the geometry of the manifold, we propose to consider incremental density ratio estimation along generalized geodesics on this manifold. To achieve such a method requires Monte Carlo sampling along geodesics via transformations of the two distributions. We show how to implement an iterative algorithm to sample along these geodesics and show how changing the distances along the geodesic affect the variance and accuracy of the estimation of the density ratio. Our experiments demonstrate that the proposed approach outperforms the existing approaches using incremental mixtures that do not take the geometry of the

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to estimate the density ratio between two probability distributions more stably when they are far apart. Specifically, traditional density ratio estimation methods often perform unstably when the source distribution and the target distribution are quite different, resulting in inaccurate estimation results. To overcome this challenge, the paper proposes a Generalized Incremental Mixture Density Ratio Estimation (GIMDRE) method based on generalized geodesics on the statistical manifold. ### Main contributions: 1. **Geometric reinterpretation**: The paper reinterprets the existing Incremental Mixture Density Ratio Estimation (IMDRE) methods from the perspective of information geometry. The author shows that these methods can be understood as sequential density ratio estimations on a specific curve (m - geodesic) on the statistical manifold. 2. **Generalized geodesics**: The paper proposes a method of using generalized geodesics (α - geodesics) for density ratio estimation. α - geodesics are more flexible curves that can connect two probability distributions on the statistical manifold. By choosing an appropriate α value, it can better adapt to density ratio estimation in different situations. 3. **Optimization algorithm**: To implement GIMDRE, the paper develops an alternating optimization algorithm. This algorithm iteratively estimates the density ratio and updates the sampling weights through Monte Carlo sampling and importance weighting techniques, thus solving the problem of interdependence between sampling and density ratio estimation. 4. **Numerical experiments**: The paper designs a series of numerical experiments to verify the effectiveness and behavior of GIMDRE. The experimental results show that, compared with the traditional IMDRE method, GIMDRE can provide more accurate and stable density ratio estimations in various situations. ### Mathematical background: - **Density ratio**: Suppose \(p_s(x)\) and \(p_t(x)\) are the probability density functions of the source distribution and the target distribution respectively, the density ratio is defined as \(r(x)=\frac{p_s(x)}{p_t(x)}\). - **α - geodesics**: On the statistical manifold, α - geodesics are curves connecting two probability distributions, and their form is: \[ \gamma^{(\alpha)}(\lambda)= \begin{cases} \left((1 - \lambda)p(x)^{\frac{1-\alpha}{2}}+\lambda q(x)^{\frac{1-\alpha}{2}}\right)^{\frac{2}{1-\alpha}}, & \text{if }\alpha\neq1\\ \exp\left((1 - \lambda)\ln p(x)+\lambda\ln q(x)\right), & \text{if }\alpha = 1 \end{cases} \] - **α - divergence**: α - divergence is used to measure the difference between two probability distributions, and is defined as: \[ D_{\alpha}[p\|q]=\frac{1}{\alpha(\alpha - 1)}\left(1-\int p(x)^{\alpha}q(x)^{1-\alpha}\,dx\right) \] ### Experimental results: - **Evaluation of different step sizes**: Table 1 shows the GIMDRE evaluation results under different step sizes \(m\). The results show that even when the step size is small, GIMDRE is significantly superior to the traditional method, and as the step size increases, the mean and standard deviation of the estimation results are further improved. - **Influence of different α values**: Tables 2 and 3 show different values under different sample sizes and dimensions.

Density Ratio Estimation via Sampling along Generalized Geodesics on Statistical Manifolds

Optimal Reduction of Multivariate Dirac Mixture Densities

Stochastic Gradient Geodesic MCMC Methods.

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

Convergence rates for estimating multivariate scale mixtures of uniform densities

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Density Estimation by Monte Carlo and Quasi-Monte Carlo

Scalable Geometric Density Estimation.

Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

A Wasserstein-Type Distance for Gaussian Mixtures on Vector Bundles with Applications to Shape Analysis

IID Sampling from Intractable Distributions

Estimating a density near an unknown manifold: a Bayesian nonparametric approach

Statistical Inference on the Hilbert Sphere with Application to Random Densities

Geometry in sampling methods: A review on manifold MCMC and particle-based variational inference methods

Trimmed Density Ratio Estimation

Nonparametric Density Estimation for Data Scattered on Irregular Spatial Domains: A Likelihood-Based Approach Using Bivariate Penalized Spline Smoothing

Sampling via Gradient Flows in the Space of Probability Measures

Sampling from manifold-restricted distributions using tangent bundle projections

Shape Analysis by Computing Geodesics on a Manifold Via Cubic B-splines

Sampling-Based Approaches to Calculating Marginal Densities

Sampling and estimation on manifolds using the Langevin diffusion