A mixed integer programming approach for gene selection

Lizhen Shao,Jieli Wang,Guangda Hu,Jiwei Liu
DOI: https://doi.org/10.1109/ICCPS.2013.6893583
2013-01-01
Abstract:It is known that for most of gene expression data for cancer classification, the number of samples is quite small compared to the number of genes. Therefore, feature selection is an essential pre-processing step and a challenging problem to remove the irrelevant or redundant genes before classification. In this paper, we model the gene selection problem as a mixed integer programming problem based on 1-norm support vector machine (SVM). This problem is difficult to solve because the number of integer variables (usually tens of thousands or even hundreds of thousands) is very big compared to the desired number of genes. To solve this problem, we propose an iterative mixed integer optimization algorithm to gradually select a subset of genes. We test the proposed approach on colon cancer and leukemia cancer gene expression datasets. The results show that our proposed algorithm performs better than fisher criterion, T-statistics, standard 2-norm SVM and SVM recursive feature elimination (SVM-RFE) methods. The selected gene subset has better classification accuracy and better generalization capability.
What problem does this paper attempt to address?