Feature Selection and Prediction of Sub-health State Using SVM-RFE

Li-min Wang,Jia-xu Chen,Yu Pei,Xin Zhao,Hua-ting Cui,Hai-zhen Cui
DOI: https://doi.org/10.1109/aici.2010.280
2010-01-01
Abstract:Sub-health state is a low-quality status between health and disease. The aim of this study was to determine which factors and/or combination of factors could be predictive of sub-health state. In this paper, we carried out a clinical epidemiology survey and obtained two datasets both of which include 50 symptoms in report. The Dataset 1 consists of 572 samples, of which 523 cases were in sub-health state and 49 cases were in healthy. The Dataset 2 consists of 185 samples, of which 131 cases were in sub-health state and 54 cases were in healthy. The Dataset 1 was used to select variables and estimate the performance of the classifier built by SVM, while the Dataset 2 was used to validate the performance of the classifier based on the Dataset 1. Based on association declined by mutual information, we propose a feature selection method based on support vector machine recursive feature elimination (SVM-RFE) to predict the sub-health state from the analysis of the clinical data. We have considered optimal performance at the threshold where sensitivity and specificity were respectively 0.82 and 0.72. The performance of this method achieved an average prediction accuracy of 80.35%. The top 8 features (symptoms) selected by SVM-RFE were as follows: Fatigue, Degree of insomnia, Pessimism, Constipation, Dysphoria, Giddiness, Anorexia and Vexation. Therefore, we propose a new method for feature selection in classification problems that uses SVM-RFE. The goal is to remove too many features during each iteration, but not to eliminate the important one.
What problem does this paper attempt to address?