Bayesian shrinkage methods for partially observed data with many predictors

Philip S. Boonstra,Bhramar Mukherjee,Jeremy M. G. Taylor
DOI: https://doi.org/10.1214/13-AOAS668
2014-01-10
Abstract:Motivated by the increasing use of and rapid changes in array technologies, we consider the prediction problem of fitting a linear regression relating a continuous outcome $Y$ to a large number of covariates $\mathbf {X}$, for example, measurements from current, state-of-the-art technology. For most of the samples, only the outcome $Y$ and surrogate covariates, $\mathbf {W}$, are available. These surrogates may be data from prior studies using older technologies. Owing to the dimension of the problem and the large fraction of missing information, a critical issue is appropriate shrinkage of model parameters for an optimal bias-variance trade-off. We discuss a variety of fully Bayesian and Empirical Bayes algorithms which account for uncertainty in the missing data and adaptively shrink parameter estimates for superior prediction. These methods are evaluated via a comprehensive simulation study. In addition, we apply our methods to a lung cancer data set, predicting survival time ($Y$) using qRT-PCR ($\mathbf {X}$) and microarray ($\mathbf {W}$) measurements.
Applications
What problem does this paper attempt to address?