An ensemble algorithm integrating consensus-clustering with feature weighting based ranking and probabilistic fuzzy logic-multilayer perceptron classifier for diagnosis and staging of breast cancer using heterogeneous datasets
Subhashis Chatterjee,Ananya Das
DOI: https://doi.org/10.1007/s10489-022-04157-0
IF: 5.3
2022-10-19
Applied Intelligence
Abstract:Breast cancer is a major threat, predominantly affecting the female population. Staging of cancer enables early detection and prognosis of patients, leading to determination of efficient and accurate treatment. Consequently, simplified models are required to integrate heterogeneous data for deriving knowledge about patients for further treatment. To achieve this goal, developing machine learning based diagnostic techniques is the predominant need. Prompted by these facts, a novel diagnostic model for staging of breast cancer infusing ensemble clustering, feature weighting based ranking of clusters and ensemble classification into benign or malignant class is developed. The proposed work constitutes of five different phases: data pre-processing, feature selection, ensemble clustering, ensemble classification, and staging of cancer. This work first employs Multiple Imputation Chained Equation for imputing missing values, followed by proposed feature selection technique employing Association Rules, Classification and Regression Tree, and Fuzzy Logic. Subsequently, a coupled clustering and classification algorithm based on consensus is developed to cluster features from different datasets using Self-Organizing Map and Decision Tree. A hierarchical clustering based ranking of these clusters using Multilinear Regression and Modified Fuzzy Analytical Hierarchical Process is proposed to prioritize features. Next, a staged classifier is developed integrating Probabilistic Fuzzy Logic and Multilayer Perceptron followed by feature extraction based staging of cancer. Finally, proposed work is validated on four datasets with various performance metrics using different combinations of train-test dataset. Moreover, k-fold cross-validation is implemented to eliminate biasedness. The detailed analysis of results of this work showcases superiority over other state-of-art methods in literature.
computer science, artificial intelligence
What problem does this paper attempt to address?