Autoencoder Imputation of Missing Heterogeneous Data for Alzheimer's Disease Classification

Namitha Thalekkara Haridas,Jose M Sanchez-Bornot,Paula L McClean,KongFatt Wong-Lin
DOI: https://doi.org/10.1101/2024.07.18.24310625
2024-07-18
Abstract:Accurate diagnosis of Alzheimer's disease (AD) relies heavily on the availability of complete and reliable data. Yet, missingness of heterogeneous medical and clinical data are prevalent and pose significant challenges. Previous studies have explored various data imputation strategies and methods on heterogeneous data, but the evaluation of deep learning algorithms for imputing heterogeneous AD data is limited. In this study, we addressed this by investigating the efficacy of denoising autoencoder-based imputation of missing key features of a heterogeneous data that comprised tau-PET, MRI, cognitive and functional assessments, genotype, sociodemographic, and medical history. We focused on extreme (40-70%) missing at random of key features which depend on AD progression; we identified them as history of mother having AD, APoE ε4 alleles, and clinical dementia rating. Along with features selected using traditional feature selection methods, we included latent features extracted from the denoising autoencoder for subsequent classification. Using random forest classification with 10-fold cross-validation, we evaluated the AD predictive performance of imputed datasets and found robust classification performance, with accuracy of 79-85% and precision of 71-85% across different levels of missingness. Additionally, our results demonstrated high recall values for identifying individuals with AD, particularly in datasets with 40% missingness in key features. Further, our feature-selected dataset using feature selection methods, including autoencoder, demonstrated higher classification score than that of the original complete dataset. These results highlight the effectiveness and robustness of autoencoder in imputing crucial information for reliable AD prediction in AI-based clinical decision support systems.
What problem does this paper attempt to address?