co-BPM: a Bayesian Model for Divergence Estimation

Kun Yang,Hao Su,Wing Hung Wong
DOI: https://doi.org/10.48550/arXiv.1410.0726
2014-10-02
Computation
Abstract:Divergence is not only an important mathematical concept in information theory, but also applied to machine learning problems such as low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection. We proposed a bayesian model---co-BPM---to characterize the discrepancy of two sample sets, i.e., to estimate the divergence of their underlying distributions. In order to avoid the pitfalls of plug-in methods that estimate each density independently, our bayesian model attempts to learn a coupled binary partition of the sample space that best captures the landscapes of both distributions, then make direct inference on their divergences. The prior is constructed by leveraging the sequential buildup of the coupled binary partitions and the posterior is sampled via our specialized MCMC. Our model provides a unified way to estimate various types of divergences and enjoys convincing accuracy. We demonstrate its effectiveness through simulations, comparisons with the \emph{state-of-the-art} and a real data example.
What problem does this paper attempt to address?