FEATURE SELECTION USING EXTRA TREES CLASSIFIER FOR PARKINSON’S DISEASE CLASSIFICATION

Gauri Sabherwal
DOI: https://doi.org/10.26782/jmcms.spl.11/2024.05.00010
2024-05-24
JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES
Abstract:Parkinson's disease (PD) is chronic, permanent, and life-threatening. Neurologically protective treatments for PD rely on early detection. Recent studies have demonstrated that clinical data, cerebrospinal Fluid (CSF) based proteomes, and gene mutations are important biomarkers for accurate and early detection of PD. This study aims to investigate the heterogeneous data comprised of CSF-based clinical data, CSF-based proteomic analysis data as well as the mutation information of the genes, Glucose Beta Acid (GBA), leucine-rich kinase (LRRK2) to classify controls into PD-affected and Healthy Control (HC). The dataset contains 1103 controls (569 PD affected and 534 HC). Automated Machine Learning (AutoML) framework using PyCaret is utilized. The study has proposed an Extra Tree Classifier (ETC) as a feature selection mechanism to select features that significantly affect the PD classification. Selected features are further used to train Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT) classifiers. Accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC-ROC), and the confusion matrix are used to evaluate the performance of classifiers. RF has depicted the best performance in terms of accuracy value of 96.12%, sensitivity of 95.59%, and specificity of 95.34% while LR has shown the highest AUC value of 98.33. RF has made the highest number of correct predictions 316 out of 331.
What problem does this paper attempt to address?