Mutual Information Assisted Ensemble Recommender System for Identifying Critical Risk Factors in Healthcare Prognosis

Abhishek Dey,Debayan Goswami,Rahul Roy,Susmita Ghosh,Yu Shrike Zhang,Jonathan H. Chan
2024-07-01
Abstract:Purpose: Health recommenders act as important decision support systems, aiding patients and medical professionals in taking actions that lead to patients' well-being. These systems extract the information which may be of particular relevance to the end-user, helping them in making appropriate decisions. The present study proposes a feature recommender, as a part of a disease management system, that identifies and recommends the most important risk factors for an illness. Methods: A novel mutual information and ensemble-based feature ranking approach for identifying critical risk factors in healthcare prognosis is proposed. Results: To establish the effectiveness of the proposed method, experiments have been conducted on four benchmark datasets of diverse diseases (clear cell renal cell carcinoma (ccRCC), chronic kidney disease, Indian liver patient, and cervical cancer risk factors). The performance of the proposed recommender is compared with four state-of-the-art methods using recommender systems' performance metrics like average precision@K, precision@K, recall@K, F1@K, reciprocal rank@K. The method is able to recommend all relevant critical risk factors for ccRCC. It also attains a higher accuracy (96.6% and 98.6% using support vector machine and neural network, respectively) for ccRCC staging with a reduced feature set as compared to existing methods. Moreover, the top two features recommended using the proposed method with ccRCC, viz. size of tumor and metastasis status, are medically validated from the existing TNM system. Results are also found to be superior for the other three datasets. Conclusion: The proposed recommender can identify and recommend risk factors that have the most discriminating power for detecting diseases.
Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the issue of identifying key risk factors in medical prognosis. Specifically, the authors propose a feature recommendation system based on mutual information and ensemble strategies to help identify and recommend key risk factors that have the greatest discriminative power for disease diagnosis and staging. ### Background and Motivation In the medical field, the vast amount of clinical data leads to information overload, making it difficult for patients and healthcare professionals to make decisions in various health-related contexts. Recommendation systems, as part of disease management systems, can help users and healthcare professionals make accurate and timely decisions. These systems assist by extracting information that is particularly relevant to the end user, helping them make appropriate decisions. ### Methods 1. **Feature Selection Methods**: - Use eight different feature selection methods (including two wrapper methods, five filter methods, and one embedded method) to initially rank the features. - From the ranking information of these methods, use an ensemble strategy to generate the final feature recommendation list. 2. **Ensemble Strategy**: - Propose a new ensemble strategy that combines the strengths of the eight different feature selection algorithms to more robustly select key risk factors. - By constructing a positional feature recommendation list, determine the final feature recommendation order based on the rankings of features in different selection methods. ### Experiments and Results 1. **Experimental Datasets**: - Four benchmark datasets were selected, covering various diseases (clear cell renal cell carcinoma, chronic kidney disease, Indian liver patient dataset, and cervical cancer risk factors). 2. **Performance Evaluation**: - Use performance metrics of the recommendation system (such as Mean Average Precision@K, Precision@K, Recall@K, F1@K, and Reciprocal Rank@K) to evaluate the effectiveness of the proposed method. - Experimental results show that the proposed method performs excellently on all four datasets, especially on the clear cell renal cell carcinoma dataset, where it can recommend all relevant key risk factors and maintain high classification accuracy with a reduced feature set (96.6% for SVM and 98.6% for neural networks). ### Conclusion The feature recommendation system proposed in this study can effectively identify and recommend risk factors with the greatest discriminative power, aiding in disease diagnosis and staging. The recommended key risk factors can be validated through existing medical systems (such as the AJCC TNM system). Experimental results indicate that the proposed method significantly outperforms several existing advanced methods in terms of performance. Future research can further extend to the analysis of medical images, extracting features and recommending regions of interest for analysis and diagnosis.