A Two-Phase Population and Subspace Feature-Based Multi-Classification Model to Improve Chronic Disease Diagnosis

Zhong-Sheng Hua,Dian Xiao,Zheng Zhang,Hong-Yu Jia
DOI: https://doi.org/10.1142/s0219622022500559
2022-01-01
International Journal of Information Technology & Decision Making
Abstract:In the chronic disease diagnosis with high-dimensional clinical features, feature selection (FS) algorithms are widely applied to avoid sparse data. In current FS algorithms, only population features, which are in strong relevance with states of all patients, are extracted, while subspace features, which are in weak relevance with states of all patients but in strong relevance with states of patients under a certain state, are ignored. Eliminated relevant information in subspace features worsens the performance of current classification models. To alleviate the conflict of feature extraction in sparse data, we propose a two-phase classification model with relevant information in both population and subspace features considered. For a patient, his probability under each state is estimated in a space whose dimensions are population features in Phase 1, and in a space whose dimensions are subspace features under that state in Phase 2. The final result of the classification model is based on results in both phases. With both population and subspace features considered and probabilities under each state estimated in a low-dimensional space, the two-phase classification model outperforms other benchmark models both in accuracy and mean absolute error in the hepatic fibrosis diagnosis for patients with chronic hepatitis B.
What problem does this paper attempt to address?