Variable Selection for High-dimensional Generalized Linear Models using an Iterated Conditional Modes/Medians Algorithm

Vitara Pungpapong,Min Zhang,Dabao Zhang
DOI: https://doi.org/10.48550/arXiv.1707.08298
2017-08-09
Abstract:High-dimensional linear and nonlinear models have been extensively used to identify associations between response and explanatory variables. The variable selection problem is commonly of interest in the presence of massive and complex data. An empirical Bayes model for high-dimensional generalized linear models (GLMs) is considered in this paper. The extension of the Iterated Conditional Modes/Medians (ICM/M) algorithm is proposed to build up a GLM. With the construction of pseudodata and pseudovariances based on iteratively reweighted least squares (IRLS), conditional modes are employed to obtain data-drive optimal values for hyperparameters and conditional medians are used to estimate regression coefficients. With a spike-and-slab prior for each coefficient, a conditional median can enforce variable estimation and selection at the same time. The ICM/M algorithm can also incorporate more complicated prior by taking the network structural information into account through the Ising model prior. Here we focus on two extensively used models for genomic data: binary logistic and Cox's proportional hazards models. The performance of the proposed method is demonstrated through both simulation studies and real data examples. The implementation of the ICM/M algorithm for both linear and nonlinear models can be found in the icmm R package which is freely available on CRAN.
Methodology
What problem does this paper attempt to address?