A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization

Mingyue Sheng,Wei Du,Yuan Tian,Yanchun Liang
DOI: https://doi.org/10.2991/bep-16.2017.14
2016-01-01
Abstract:It is an important and hot topic for researchers to develop an efficient and robust feature selection method from gene expression profile data with thousands of genes and small sample size. At present, most of feature selection methods are constructed models to use all samples of gene expression data, but these methods are never considered the influence of outlier samples and the distribution of samples. Besides, it is well known that cancer is a kind of heterogeneous disease, and different cancer tissue samples of same organs have many different subtypes on molecular characteristics. So, we should select samples with the same genetic characteristics to construct models. Therefore, in this article, we proposed a novel and efficient feature selection approach based on localized samples to extract gene signatures more accurately. We picked out the nearest samples in a certain range for each target sample and obtained the best localized samples by constructing a sample-sample similarity network, which calculated Euclidean distance between the central samples with others by using gene expression values firstly. Secondly, we established the co-expression networks by selecting top nearest samples, and formed steady-state probability network applying to Random Walk with Restart (RWR) method. Finally, through dividing into this network and comparing five selection strategies, we got localized samples for best cancer classification. We applied our method on six datasets across different cancer types. The average accuracies of top 100 genes of the method by SVM classifiers in leave-one-out cross validation (LOOCV) are 95.46%, 94.01%, 96.20%, 99.79%, 99.08% and 99.37%, respectively. The results show that the proposed method obtains excellent performance on these datasets. It also indicates that the proposed method is effective and applicable.
What problem does this paper attempt to address?