A supervised machine learning approach to predicting mild cognitive impairment among diverse Hispanics/Latinos: The SOL‐INCA Study

Sayaka Kuwayama,Kevin A González,Freddie Márquez,Linda C Gallo,Hector M González,Wassim Tarraf
DOI: https://doi.org/10.1002/alz.077063
2023-12-01
Abstract:Abstract Background Existing literature highlights the need for better risk assessment of mild cognitive impairment (MCI) to mitigate the onset and burden of Alzheimer’s Disease and Related Dementia (ADRD) through earlier interventions. Hispanics/Latinos are the largest US ethnic/racial minority, at higher risk for developing ADRD compared to Non‐Hispanic Whites, but underrepresented in dementia‐related research. This study develops prediction models for targetable protective and risk factors of MCI in diverse, middle‐aged Hispanics/Latinos in the US. Method We use data (n = 4246, average baseline age = 56‐years) from the Hispanic Community Health Study/ Study of Latinos (HCHS/SOL; 2008‐2011), a multisite prospective cohort study of diverse Hispanic/ Latinos, and its ancillary study, the SOL‐Investigation of Neurocognitive Aging (SOL‐INCA). Our outcome is prevalent MCI at SOL‐INCA visit (average 7‐years after the baseline) operationalized based on National Institutes of Aging‐Alzheimer’s Association criteria. Using nearly 40 baseline indicators representing (1) sociodemographic characteristics, (2) childhood factors, (3) acculturation factors, (4) biological and (5) behavioral markers, and (6) mental and (7) functional health, we compared a supervised machine learning (Random Forest; RF) and a standard statistical technique (logistic regression; LR) in identifying predictors. Additionally, to evaluate RF as a feature selection tool, we estimated Area Under the Curve (AUC) from LR using all factors and a subset of factors identified as important by RF. Result LR identified socioeconomic conditions and mental and physical health scores as among the leading factors predicting MCI. RF identified a larger subset of predictors and differed from LR in ranking the relative importance of common predictors. Removal of factors identified as insignificant in predicting MCI by RF (n = 17) slightly improved classification performance in LR (AUC all factors = 0.692, AUC subset by RF = 0.697). Conclusion Supervised machine learning (ML) identified different predictors for MCI compared to standard methods and was compelling as a feature selection method. This suggests potential value of ML to focus on significant predictors for MCI prediction. Substantively, our findings emphasize the importance of accounting for life‐course risk and protective factors for predicting cognitive health. Further investigations are needed to test additional ML modalities and better understand how complex interactions of life‐course multi‐domain factors can be addressed through interventions.
clinical neurology
What problem does this paper attempt to address?