Overcoming computational inability to predict clinical outcome from high-dimensional patient data using Bayesian methods

A Shalabi,A C C Coolen,E de Rinaldis
DOI: https://doi.org/10.48550/arXiv.1406.5062
2014-06-19
Abstract:Clinical outcome prediction from high-dimensional data is problematic in the common setting where there is only a relatively small number of samples. The imbalance causes data overfitting, and outcome prediction becomes computationally expensive or even impossible. We propose a Bayesian outcome prediction method that can be applied to data of arbitrary dimension d, from 2 outcome classes, and reduces overfitting without any approximations at parameter level. This is achieved by avoiding numerical integration or approximation, and solving the Bayesian integrals analytically. We thereby reduce the dimension of numerical integrals from 2d dimensions to 4, for any d. For large d, this is reduced further to 3, and we obtain a simple outcome prediction formula without integrals in leading order for very large d. We compare our method to the mclustDA method (Fraley and Raftery 2002), using simulated and real data sets. Our method perform as well as or better than mclustDA in low dimensions d. In large dimensions d, mclustDA breaks down due to computational limitations, while our method provides a feasible and computationally efficient alternative.
Computation,Methodology
What problem does this paper attempt to address?