Interpretable machine learning classifiers implicate GPC6 in Parkinson's disease from single-nuclei midbrain transcriptomes

Michael R. Fiorini,Jialun Li,Edward A. Fon,Sali M.K. Farhan,Rhalena A. Thomas
DOI: https://doi.org/10.1101/2024.11.19.24317547
2024-11-20
Abstract:Parkinson's disease (PD) is a progressive and devastating neurodegenerative disease. An incomplete understanding of its genetic architecture remains a major barrier to the clinical translation of targeted therapeutics, necessitating novel approaches to uncover elusive genetic determinants. Single-cell and single-nuclear RNA sequencing (scnRNAseq) can help bridge this gap by profiling individual cells for disease-associated differential gene expression and nominating genes for targeted genomic analyses. Here, we introduce a machine learning framework to identify molecular features that characterize post-mortem brain cells from PD patients. We train classifiers to distinguish between PD and healthy cells, then decode the models to unravel the 'reasons' behind the classifications, revealing key genes expression signatures that characterize cells from the parkinsonian brain. Application of this framework to three publicly available snRNAseq datasets characterizing the post-mortem midbrain identified cell-type-specific gene sets that accurately classify PD cells across all datasets, demonstrating our approach's capacity to identify robust molecular markers of disease. Targeted genomic analyses of the key genes characterizing PD cells revealed a previously undescribed association between PD and rare variants in GPC6, a member of the heparan sulfate proteoglycan family, which have been implicated in the intracellular accumulation of alpha-synuclein preformed fibrils. We replicate this association in three separate case-control cohorts. Our method promises to enhance understanding of the genetic architecture in complex diseases like PD, representing a critical step toward targeted therapeutics. Our publicly available framework is readily applicable across diseases.
What problem does this paper attempt to address?