Abstract:Background: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L-1), and many L-1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L-1/2 regularization can be taken as a representative of Lq (0 < q < 1) regularizations and has been demonstrated many attractive properties.Results: In this work, we investigate a sparse logistic regression with the L-1/2 penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L-1/2 penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L-1/2 regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L-1 and elastic net regularization approaches.Conclusions: From our evaluations, it is clear that the sparse logistic regression with the L-1/2 penalty achieves higher classification accuracy than those of ordinary L-1 and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L-1/2 penalty is effective technique for gene selection in real classification problems.

Global Feature Selection from Microarray Data Using Lagrange Multipliers

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Parameters Selection in Gene Selection Using Gaussian Kernel Support Vector Machines by Genetic Algorithm

Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data

An Adaptive Feature Selection Method for Microarray Data Analysis

The Unsupervised Feature Selection Algorithms Based on Standard Deviation and Cosine Similarity for Genomic Data Analysis

Support Vector Machine-Recursive Feature Elimination for Localized Feature Selection

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Optimal Feature Selection for Sparse Linear Discriminant Analysis and Its Applications in Gene Expression Data

A kernel-based multivariate feature selection method for microarray data classification.

Sparse Logistic Regression with a L 1/2 Penalty for Gene Selection in Cancer Classification

Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data

A Hybrid Feature-Selection Method Based on mRMR and Binary Differential Evolution for Gene Selection

Pathway-based feature selection algorithms identify genes discriminating patients with multiple sclerosis apart from controls

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification

Multilevel Feature Selection Method for Improving Classification of Microarray Gene Expression Data

Hybrid feature selection based on SLI and genetic algorithm for microarray datasets

Subsampling Winner Algorithm for Feature Selection in Large Regression Data

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data

Gene selection and classification for cancer microarray data based on machine learning and similarity measures