Telescoping Density-Ratio Estimation

Benjamin Rhodes,Kai Xu,Michael U. Gutmann
DOI: https://doi.org/10.48550/arXiv.2006.12204
2020-11-24
Abstract:Density-ratio estimation via classification is a cornerstone of unsupervised learning. It has provided the foundation for state-of-the-art methods in representation learning and generative modelling, with the number of use-cases continuing to proliferate. However, it suffers from a critical limitation: it fails to accurately estimate ratios p/q for which the two densities differ significantly. Empirically, we find this occurs whenever the KL divergence between p and q exceeds tens of nats. To resolve this limitation, we introduce a new framework, telescoping density-ratio estimation (TRE), that enables the estimation of ratios between highly dissimilar densities in high-dimensional spaces. Our experiments demonstrate that TRE can yield substantial improvements over existing single-ratio methods for mutual information estimation, representation learning and energy-based modelling.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitation of density - ratio estimation when dealing with highly different distributions. Specifically, the traditional density - ratio estimation method implemented through classification cannot accurately estimate the density ratio when the two distributions are very different (for example, when the KL divergence between them exceeds dozens of nats). This phenomenon is called the "density canyon" problem. The paper proposes a new framework - Telescoping Density - Ratio Estimation (TRE), aiming to overcome this problem and be able to more accurately estimate the density ratio between highly different distributions in high - dimensional space. ### Main Contributions 1. **Introducing the TRE Framework**: TRE gradually approximates the target density ratio by decomposing a large density ratio into a series of smaller density ratios, thus solving the inaccuracy problem of traditional methods when dealing with highly different distributions. 2. **Improving Sample Efficiency**: Experiments show that TRE is significantly superior to existing single - density - ratio methods in tasks such as estimating mutual information, representation learning, and energy - based modeling, especially more prominent in high - dimensional data. 3. **Expanding the Application Range**: TRE is applicable not only to continuous variables but also to discrete variables, which gives it potential application value in fields such as natural language processing. ### Solution The core idea of TRE is to use the "divide - and - conquer" strategy. The specific steps are as follows: 1. **Waymark Creation**: Transition gradually from samples of distribution \(p\) to samples of distribution \(q\) to generate a series of intermediate data sets. Each intermediate data set can be regarded as samples drawn from an implicit distribution \(p_k\). 2. **Bridge - Building**: Learn the density ratio \(r_k(x; \theta_k)\approx\frac{p_k(x)}{p_{k + 1}(x)}\) between adjacent intermediate points through classification tasks. These density - ratio models are called "bridges". 3. **Combining Bridges**: Multiply the estimated values of all bridges to obtain the estimated value of the original density ratio \(r(x; \theta)=\prod_{k = 0}^{m - 1}r_k(x; \theta_k)\approx\frac{p_0(x)}{p_m(x)}\). ### Experimental Verification The paper verifies the effectiveness of TRE through multiple experiments: - **One - Dimensional Spike Ratio**: Demonstrates the advantages of TRE in dealing with extremely spiky distributions. - **High - Dimensional Ratio and Large Mutual Information**: On high - dimensional Gaussian distributions, TRE can accurately estimate mutual information as high as 80 nats, while traditional methods become very inaccurate above 20 nats. - **Mutual Information Estimation and Representation Learning**: On the SpatialMultiOmniglot data set, TRE performs excellently in estimating high mutual information values and learning high - quality representations. - **Energy - Based Modeling**: On the MNIST data set, TRE can effectively learn energy - based models, especially when using simple noise distributions, and its performance is significantly better than that of traditional methods. In conclusion, by introducing the TRE framework, this paper successfully solves the limitations of traditional density - ratio estimation methods when dealing with highly different distributions and provides a new solution for density - ratio estimation of high - dimensional data.