Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities
Ronan Perry,Ronak Mehta,Richard Guo,Eva Yezerets,Jesús Arroyo,Mike Powell,Hayden Helm,Cencheng Shen,Joshua T. Vogelstein
DOI: https://doi.org/10.48550/arXiv.1907.00325
2021-10-06
Abstract:Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty. Current widely used approaches for computing such quantities rely on nearest neighbor methods and exhibit both strong performance and theoretical guarantees in certain simple scenarios. However, existing approaches fail in high-dimensional settings and when different features are measured on different <a class="link-external link-http" href="http://scales.We" rel="external noopener nofollow">this http URL</a> propose decision forest-based adaptive nearest neighbor estimators and show that they are able to effectively estimate posterior probabilities, conditional entropies, and mutual information even in the aforementioned <a class="link-external link-http" href="http://settings.We" rel="external noopener nofollow">this http URL</a> provide an extensive study of efficacy for classification and posterior probability estimation, and prove certain forest-based approaches to be consistent estimators of the true posteriors and derived information-theoretic quantities under certain assumptions. In a real-world connectome application, we quantify the uncertainty about neuron type given various cellular features in the Drosophila larva mushroom body, a key challenge for modern neuroscience.
Machine Learning