A novel approach to perform linear discriminant analyses for a 4-way alzheimer's disease diagnosis based on an integration of pearson's correlation coefficients and empirical cumulative distribution function

Besma Mabrouk,Ahmed Ben Hamida,Noura Mabrouki,Nouha Bouzidi,Chokri Mhiri
DOI: https://doi.org/10.1007/s11042-024-18532-1
IF: 2.577
2024-02-21
Multimedia Tools and Applications
Abstract:Diagnosing Alzheimer's disease (AD) remains a significant challenge, particularly in effectively identifying individuals in the early (EMCI) and late (LMCI) stages of Mild Cognitive Impairment (MCI) within the normal control subjects (CN). Leveraging the Alzheimer's Disease Neuroimaging Initiative (ADNI) database and relevant datasets, our aim is to establish a 4-way framework for multi-class diagnosis. Linear Discriminant Analysis (LDA), often coupled with Principal Component Analysis (PCA), has conventionally served as a method for supervised classification. However, this paper introduces an alternative approach using Pearson's correlation coefficient (PCC) instead of PCA. We integrate the optimal LDA subspace with the PCC method, primarily to address the singularity issue that arises when dealing with an underdetermined dataset. Our methodology comprises three main steps. Firstly, we engage in the preprocessing of 237 Diffusion Tensor and Magnetic Resonance brain images to map brain connectivity and extract connections within and between hemispheres. Secondly, we calculate correlation coefficients between features and classes, subsequently constructing empirical cumulative distribution functions (ECDF). Features exceeding a predetermined percentile in the ECDF, guaranteeing the non-singularity of the within-class variance matrix, are subsequently chosen and assessed using a primary classifier. The top k features, linked to the highest classification accuracy, are then mapped into the LDA space through 100 iterations of five-fold Cross-Validation. Following each trial, we assess the performance of six machine learning algorithms, selecting the Logistic Regression classifier to gauge the reliability of our proposed method. As a result, we observed a significant improvement in average accuracy, achieving a performance of 65.46% ± 1.94%, compared to the commonly used PCA+LDA approach, which achieved 50.71% ± 2.1%. Notably, our work achieved 100% accuracy in diagnosing the LMCI class, surpassing other methods. Furthermore, in a separate experiment conducted within and between hemispheres datasets, we identified connectivity between hemispheres as a pivotal biomarker for disease diagnosis in a medical context.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?