Abstract:Background: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L-1), and many L-1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L-1/2 regularization can be taken as a representative of Lq (0 < q < 1) regularizations and has been demonstrated many attractive properties.Results: In this work, we investigate a sparse logistic regression with the L-1/2 penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L-1/2 penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L-1/2 regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L-1 and elastic net regularization approaches.Conclusions: From our evaluations, it is clear that the sparse logistic regression with the L-1/2 penalty achieves higher classification accuracy than those of ordinary L-1 and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L-1/2 penalty is effective technique for gene selection in real classification problems.

An Aggregation Method for Sparse Logistic Regression

On Regularized Sparse Logistic Regression

Sparse Logistic Regression with a L 1/2 Penalty for Gene Selection in Cancer Classification

A Fast Hybrid Algorithm for Large-Scale L1-Regularized Logistic Regression

A Sparse-Group Lasso

Group Logistic Regression Models with <i>l</i><sub><i>p</i>,<i>q</i></sub> Regularization

Clinical Risk Prediction With Multilinear Sparse Logistic Regression

Network-Regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery

Supporting Regularized Logistic Regression Privately and Efficiently

Weighted Lasso Estimates for Sparse Logistic Regression: Non-Asymptotic Properties with Measurement Errors

Compression and Aggregation for Logistic Regression Analysis in Data Cubes

Robust adaptive LASSO in high-dimensional logistic regression

Scalable Estimation and Regularization for the Logistic Normal Multinomial Model

Minimax sparse logistic regression for very high-dimensional feature selection.

A Double-Penalized Estimator to Combat Separation and Multicollinearity in Logistic Regression

Simultaneous Dimension Reduction and Variable Selection for Multinomial Logistic Regression

Low-Rank Graph-Regularized Structured Sparse Regression for Identifying Genetic Biomarkers

Feature Screening Strategy for Non-Convex Sparse Logistic Regression with Log Sum Penalty

A new regularization path for logistic regression via linearized Bregman