Structured Sparse Logistic Regression with Application to Lung Cancer Prediction Using Breath Volatile Biomarkers.

Xiaochen Zhang,Qingzhao Zhang,Xiaofeng Wang,Shuangge Ma,Kuangnan Fang
DOI: https://doi.org/10.1002/sim.8454
2019-01-01
Statistics in Medicine
Abstract:This article is motivated by a study of lung cancer prediction using breath volatile organic compound (VOC) biomarkers, where the challenge is that the predictors include not only high-dimensional time-dependent or functional VOC features but also the time-independent clinical variables. We consider a high-dimensional logistic regression and propose two different penalties: group spline-penalty or group smooth-penalty to handle the group structures of the time-dependent variables in the model. The new methods have the advantage for the situation where the model coefficients are sparse but change smoothly within the group, compared with other existing methods such as the group lasso and the group bridge approaches. Our methods are easy to implement since they can be turned into a group minimax concave penalty problem after certain transformations. We show that our fitting algorithm possesses the descent property and leads to attractive convergence properties. The simulation studies and the lung cancer application are performed to demonstrate the accuracy and stability of the proposed approaches.
What problem does this paper attempt to address?