FES-RF - A Feature Ensemble Selection Based Random Forest Method for Accurate Cancer Screening.

Jiatong Liu,Changbin Pan,Dongdong Chen,WeiPing Lin,Shangyuan Feng,Sufang Qiu,Beizhan Wang,KunHong Liu
DOI: https://doi.org/10.1109/bibm52615.2021.9669416
2021-01-01
Abstract:The diagnosis and analysis of cancer are usually roughly judged through the accumulation of professional knowledge, which is difficult to deal with a large number of patient samples and a variety of causes and symptoms. Moreover, most of the existing machine learning methods are black-box, and can not give reasonable diagnosis basis. Therefore, an accurate and interpretable method is urgently required for cancer diagnosis. In this paper, a total of 700 serum samples consisting of three groups of patients and one group of healthy individuals were collected and subjected to SERS measurements. We rank the Raman spectra of 700 human SERA according to the feature importance, and construct the feature importance vector reflecting the spectral feature importance. We further construct candidate feature sets based on importance selection, so as to construct a random forest model based on feature ensemble selection. On the one hand, we compare the proposed method with the popular machine learning methods to verify the effectiveness in the task of cancer screening. On the other hand, we conduct qualitative and quantitative analysis of cancer characteristics, and give model basis and biomedical explanation for the impact of different important cancer characteristics on the final classification and diagnosis. Some more experimental results and discussions are included in the appendix. Our source code and appendix are available under https://github.com/liujiatong429/BIBM2021
What problem does this paper attempt to address?