Abstract:High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a L1 penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This L1-based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the L1 regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset.

Adaptive Lightweight Regularization Tool for Complex Analytics

Improving Data Analytics with Fast and Adaptive Regularization

An Aggressive Reduction on the Complexity of Optimization for Non-Strongly Convex Objectives

LDA-Reg: Knowledge Driven Regularization using External Corpora

Genetic Programming Based On An Adaptive Regularization Method

Adaptive Noisy Data Augmentation for Regularized Estimation and Inference in Generalized Linear Models

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks

The Implicit Regularization for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Adaptive ensemble of classifiers with regularization for imbalanced data classification

Regularized EM Algorithms: A Unified Framework and Statistical Guarantees

Adaptive Regularization of Labels

Regularization for Adversarial Robust Learning

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

A generalization of regularized dual averaging and its dynamics

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models

Adaptive debiased SGD in high-dimensional GLMs with streaming data

Adaptive Attention-Driven Manifold Regularization for Deep Learning Networks: Industrial Predictive Modeling Applications and Beyond

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

An Adaptive Gradient Regularization Method

Discounted Adaptive Online Learning: Towards Better Regularization

Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families