Abstract:High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a L1 penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This L1-based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the L1 regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset.

L0-Regularized Learning for High-Dimensional Additive Hazards Regression

High-Dimensional Sparse Additive Hazards Regression

BOOSTED NONPARAMETRIC HAZARDS WITH TIME-DEPENDENT COVARIATES

High-dimensional Feature Selection in Competing Risks Modeling: A Stable Approach Using a Split-and-merge Ensemble Algorithm

Regularized Multinomial Regression Method for Hyperspectral Data Classification Via Pathwise Coordinate Optimization

On the Maximum Penalized Full Likelihood Approach for Cox Model with Extreme Value for Heavily Censored Survival Data

High-dimensional variable selection for Cox's proportional hazards model

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models

Nonnegative Adaptive Lasso for Ultra-High Dimensional Regression Models and A Two-Stage Method Applied in Financial Modeling

Regression analysis of multiplicative hazards model with time-dependent coefficient for sparse longitudinal covariates

Estimating Treatment Effect under Additive Hazards Models with High-dimensional Covariates

Variable selection for proportional hazards models with high‐dimensional covariates subject to measurement error

Additive Hazards Regression Models for Survival Data

Longitudinal LASSO: Jointly Learning Features and Temporal Contingency for Outcome Prediction

Survival Analysis with Graph-Based Regularization for Predictors

FastSurvival: Hidden Computational Blessings in Training Cox Proportional Hazards Models

Variable selection for nonlinear Cox regression model via deep learning

Risk factor selection in rate making: EM adaptive LASSO for zero-inflated poisson regression models.

Penalized Variable Selection with Broken Adaptive Ridge Regression for Semi-competing Risks Data

Model-X Knockoffs for high-dimensional controlled variable selection under the proportional hazards model with heterogeneity parameter

A scalable and flexible Cox proportional hazards model for high-dimensional survival prediction and functional selection