Abstract:To deal with the challenging problem of recognizing the small number of distinguishable genes which can tell the cancer patients from normal people in a dataset with a small number of samples and tens of thousands of genes, novel hybrid gene selection algorithms are proposed in this paper based on the statistical correlation and K-means algorithm. The Pearson correlation coefficient and Wilcoxon signed-rank test are respectively adopted to calculate the importance of each gene to the classification to filter the least important genes and preserve about 10 percent of the important genes as the pre-selected gene subset. Then the related genes in the pre-selected gene subset are clustered via K-means algorithm, and the weight of each gene is calculated from the related coefficient of the SVM classifier. The most important gene, with the biggest weight or with the highest votes when the roulette wheel strategy is used, is chosen as the representative gene of each cluster to construct the distinguishable gene subset. In order to verify the effectiveness of the proposed hybrid gene subset selection algorithms, the random selection strategy (named Random) is also adopted to select the representative genes from clusters. The proposed distinguishable gene subset selection algorithms are compared with Random and the very popular gene selection algorithm SVM-RFE by Guyon and the pre-studied gene selection algorithm SVM-SFS. The average experimental results of 200 runs of the aforementioned gene selection algorithms on some classic and very popular gene expression datasets with extensive experiments demonstrate that the proposed distinguishable gene subset selection algorithms can find the optimal gene subset, and the classifier based on the selected gene subset achieves very high classification accuracy.

A Stable Gene Subset Selection Algorithm For Cancers

A Cancer Gene Selection Algorithm Based on the K-S Test and CFS

Statistical Correlation and K-Means Based Distinguishable Gene Subset Selection Algorithms

Gene Selection Algorithm Based on Correlation Analysis

An Ensemble Correlation-Based Gene Selection Algorithm for Cancer Classification with Gene Expression Data

Novel Hybrid Method for Gene Selection and Cancer Prediction

Parameters Selection in Gene Selection Using Gaussian Kernel Support Vector Machines by Genetic Algorithm

Ensemble gene selection for cancer classification

Gene Selection for Cancer Clustering Analysis Based on Expression Data

Model-free Gene Selection Using Genetic Algorithms

Gene Selection Using Genetic Algorithm and Support Vectors Machines

Minimum Bayesian Error Probability-Based Gene Subset Selection

On Gene Selection and Classification for Cancer Microarray Data Using Multi-Step Clustering and Sparse Representation

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Gene Selection for the Discrimination of Colorectal Cancer.

Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis.

Gene Markers Identification Algorithm for Detecting Colon Cancer Patients

Gene selection algorithm based on K-S test and mRMR

A Survey of Gene Selection and Classification Techniques Based on Cancer Microarray Data Analysis

Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

Feature selection on cancer classification by a two-step clustering algorithm