A Hybrid Feature Selection Algorithm Used in Disease Association Study

Bin Wei,Qinke Peng,Xuejiao Kang,Chenyao Li
DOI: https://doi.org/10.1109/wcica.2010.5554442
2010-01-01
Abstract:With the rapid development of high-throughput genotyping technologies, more and more attentions are paid to the disease association study identifying DNA variations that are highly associated with a specific disease. One main challenge for this study is to find the optimal subsets of Single Nucleotide Polymorphisms (SNPs) which are most tightly associated with diseases. Feature selection which might effectively reduce the computational complexity has become a necessity in many bioinformatics applications. Hence we present a prediction algorithm based on support vector machine (SVM) with a hybrid feature selection method named F-score and compact GA (FCGA). FCGA combines the advantage of filter method and wrapper method, which not only eliminates the redundancy of feature and reduces computing time, but also solves the problem of SVM's parameters selection. We use this prediction algorithm to analyze the lung cancer dataset including 595 samples and each one has 141 SNPs. To evaluate the prediction accuracy of our algorithm, we compare it with Naive Bayes along with some commonly used feature selection methods. The experimental results show that the proposed algorithm has the highest level of accuracies compared with the other methods.
What problem does this paper attempt to address?