Abstract:BACKGROUND AND OBJECTIVES:Identification of individuals at high risk of developing Parkinson disease (PD) several years before diagnosis is crucial for developing treatments to prevent or delay neurodegeneration. This study aimed to develop predictive models for PD risk that combine plasma proteins and easily accessible clinical-demographic variables. METHODS:Using data from the UK Biobank (UKB), which recruited participants across the United Kingdom, we conducted a longitudinal study to identify predictors for incident PD. Participants with baseline plasma proteins and no PD were included. Through machine learning, we narrowed down predictors from a pool of 1,463 plasma proteins and 93 clinical-demographic. These predictors were then externally validated using the Parkinson's Progression Marker Initiative (PPMI) cohort. To further investigate the temporal trends of predictors, a nested case-control study was conducted within the UKB. RESULTS:A total of 52,503 participants without PD (median age 58, 54% female) were included. Over a median follow-up duration of 14.0 years, 751 individuals were diagnosed with PD (median age 65, 37% female). Using a forward selection approach, we selected a panel of 22 plasma proteins for optimal prediction. Using an ensemble tree-based Light Gradient Boosting Machine (LightGBM) algorithm, the model achieved an area under the receiver operating characteristic curve (AUC) of 0.800 (95% CI 0.785-0.815). The LightGBM prediction model integrating both plasma proteins and clinical-demographic variables demonstrated enhanced predictive accuracy, with an AUC of 0.832 (95% CI 0.815-0.849). Key predictors identified included age, years of education, history of traumatic brain injury, and serum creatinine. The incorporation of 11 plasma proteins (neurofilament light, integrin subunit alpha V, hematopoietic PGD synthase, histamine N-methyltransferase, tubulin polymerization promoting protein family member 3, ectodysplasin A2 receptor, Latexin, interleukin-13 receptor subunit alpha-1, BAG family molecular chaperone regulator 3, tryptophanyl-TRNA synthetase, and secretogranin-2) augmented the model's predictive accuracy. External validation in the PPMI cohort confirmed the model's reliability, producing an AUC of 0.810 (95% CI 0.740-0.873). Notably, alterations in these predictors were detectable several years before the diagnosis of PD. DISCUSSION:Our findings support the potential utility of a machine learning-based model integrating clinical-demographic variables with plasma proteins to identify individuals at high risk for PD within the general population. Although these predictors have been validated by PPMI, additional validation in a more diverse population reflective of the general community is essential.

Machine learning based risk prediction for Parkinson's disease with nationwide health screening data

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database

Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data

Constructing Prediction Models for Excessive Daytime Sleepiness by Nomogram and Machine Learning: A Large Chinese Multicenter Cohort Study

Predicting the progression of Parkinson's disease using conventional MRI and machine learning: An application of radiomic biomarkers in whole-brain white matter

Prediction of Future Parkinson Disease Using Plasma Proteins Combined with Clinical-Demographic Measures

Machine Learning–Based Prediction of Neurodegenerative Disease in Patients With Type 2 Diabetes by Derivation and Validation in 2 Independent Korean Cohorts: Model Development and Validation Study

Machine Learning for Early Detection of Cognitive Decline in Parkinson's Disease Using Multimodal Biomarker and Clinical Data

An Improved Approach for Prediction of Parkinson's Disease using Machine Learning Techniques

Further evidence that the Hajdu-Cheney syndrome and the "serpentine fibula-polycystic kidney syndrome" are a single entity.

A Machine Learning Approach for Early Identification of Prodromal Parkinson's Disease

Kinship and Diversification of Bacterial Penicillin-Binding Proteins and β-Lactamases

Deep Learning Prediction of Parkinson's Disease using Remotely Collected Structured Mouse Trace Data

ESDC-LSH: Ensemble Support-Vector Deep Convolutional Based Levy Selfish Herd Optimization for Prediction and Classification of Parkinson's Disease

Parkinson’s Disease Prediction Through Machine Learning Techniques

Parkinson Disease Prediction Using Machine Learning Algorithm

[Conclusiveness of various diagnostic procedures for squamous cell carcinomas of the visceral cranium and the oral cavity as compared with histological evaluations].

Development of a depression in Parkinson's disease prediction model using machine learning

Parkinson’s Disease Prediction Using Machine Learning

Machine Learning Models for Parkinson Disease: Systematic Review

Prediction of individual progression rate in Parkinson's disease using clinical measures and biomechanical measures of gait and postural stability