Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models

Lori A. Dalton
DOI: https://doi.org/10.1109/TCBB.2015.2465966
2016-07-01
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract:Popular tools to evaluate classifier performance are the false positive rate FPR, true positive rate TPR, receiver operator characteristic ROC curve, and area under the curve AUC. Typically, these quantities are estimated from training data using simple resampling and counting methods, which have been shown to perform poorly when the sample size is small, as is typical in many applications. This work takes a model-based approach in classifier training and performance analysis, where we assume the true population densities are members of an uncertainty class of distributions. Given a prior over the uncertainty class and data, we form a posterior and derive optimal mean-squared-error MSE FPR and TPR estimators, as well as the sample-conditioned MSE performance of these estimators. The theory also naturally leads to optimal ROC and AUC estimators. Finally, we develop a Neyman-Pearson-based approach to optimal classifier design, which maximizes the estimated TPR for a given estimated FPR. These tools are optimal over the uncertainty class of distributions given the sample, and are available in closed form or can be easily approximated for many models. Applications are demonstrated on both synthetic and real genomic data. MATLAB code and simulations results are available in the online supplementary material.
computer science, interdisciplinary applications,biochemical research methods,mathematics,statistics & probability
What problem does this paper attempt to address?