Algorithmic Randomness Based Feature Selection for Traditional Chinese Chronic Gastritis Diagnosis.

Huazhen Wang,Bing Lv,Fan Yang,Kai Zheng,Xuan Li,Xueqin Hu
DOI: https://doi.org/10.1016/j.neucom.2014.03.016
IF: 6
2014-01-01
Neurocomputing
Abstract:Machine learning methods involving multivariate interacting effects have become mainstream in feature selection. However, the feature importance score generated by machine learning methods is not statistically interpretable, which hampers its application in practice like medical diagnosis. In this study, a framework of Algorithmic Randomness based Feature Selection (ARFS) is proposed to measure the feature importance score using the p-value which derives from the combination of algorithmic randomness test and machine learning methods. In ARFS, a machine learning algorithm, such as random forest (RF), support vector machine (SVM) and naïve Bayes classifier (NB) is used to compute the nonconformity score of each example belonging to data distribution, and then the p-value from algorithmic randomness test is obtained from nonconformity scores. ARFS evaluates the importance of each feature with the reduction of p-value on the datasets before and after random permutation of that feature, which makes it statistically interpretable. To demonstrate its efficiency, three ARFS models, i.e. ARFS-RF, ARFS-SVM and ARFS-NB were used to compare with some feature selection approaches, i.e. RF-ACC, RF-Gini, KNNpermute, SMFS, ANOVA and SNR. The results showed that ARFS-RF obtained better performances both on the synthetic and benchmark datasets. Further study on chronic gastritis dataset in Traditional Chinese Medicine (TCM) showed that the symptom sets given by ARFS-RF performs substantially better than that of TCM experts with the same size. The symptom ranking list generated by ARFS-RF can offer counselling for the physician to design, select, and interpret the symptoms in chronic gastritis diagnosis.
What problem does this paper attempt to address?