MU-BRAIN: MUltiethnic Brain Rna-seq for Alzheimer INitiative

Zikun Yang,Basilio Cieza,Dolly Reyes-Dumeyer,Annie Lee,Yiyi Ma,Elanur Yilmaz,Rafael Lantigua,Gary W Miller,Lewis M Brown,Lawrence Honig,Benjamin Ciener,Sandra Leskinen,Sharanya Sivakumar,Badri Vardarajan,Brittany N Dugger,Lee-Way Jin,Melissa E Murray,Dennis W Dickson,Robert A Rissman,Annie Hiniker,Margaret Pericak-Vance,Jeff Vance,Tatiana M Foroud,Caghan Kizil,Andrew F Teich,Richard Mayeux,Giuseppe Tosto
DOI: https://doi.org/10.1101/2024.02.20.581250
2024-03-21
Abstract:Alzheimer’s Disease (AD) exhibits a complex molecular and phenotypic profile. Investigating gene expression plays a crucial role in unraveling the disease’s etiology and progression. Transcriptome data across ethnic groups lack, negatively impacting equity in intervention and research. We employed 565 brains across six U.S. brain banks ( 399 non-Hispanic Whites, =113 Hispanics, 12 African Americans) to generated bulk RNA sequencing from prefrontal cortex. We sought to identify cross-ancestry and ancestry-specific differentially expressed genes (DEG) across Braak stages, adjusting for sex, age at death, and RNA quality metrics. We further validated our findings using the Religious Orders Study/Memory Aging Project brains (ROS/MAP; 1,095) and performed metanalysis ( 1,660). We conducted Gene Set and Variation and Enrichment analysis (GSVA; GSEA). We employed a machine-learning approach for phenotype prediction and gene prioritization to construct a polytranscriptomics risk score (PTRS) splitting our sample into training and testing sub-samples, either randomly or by ethnicity (“ancestry-agnostic” and “ancestry-aware”, respectively). Lastly, we validated top DEG using single-nucleus RNA sequencing (snRNAseq) data. We identified several DEG associated with Braak staging: AD-known genes ( =3.78E- 07) and ( =1.21E-04) were consistently differentially expressed across statistical models, ethnicities, and replicated in ROS/MAP. Genes from the heat shock protein ( ) family, e.g. ( =3.78E-07), were the top differentially expressed genes and replicated in ROS/MAP. Ethnic-stratified analyses prioritized and as top Hispanics DEG. GSEA highlighted “ ” ( =4.24E-06) and “ ” ( =1.68E-08) pathways. Up- and down-regulated genes were enriched in several pathways (e.g. “ , “ ”, “ ”). Ancestry-agnostic and ancestry-aware PTRS effectively classified brains (AUC=0.77 and 0.73 respectively) and replicated in ROS/MAP. snRNAseq validated prioritized genes, including downregulated in neurons; =1.1 E-07). This is the largest diverse AD transcriptome in post-mortem brain tissue, to our knowledge. We identified perturbated genes, pathways and network expressions in AD brains resulting in cross- ethnic and ethnic-specific findings, ultimately highlighting the diversity within AD pathogenesis. The latter underscores the need for an integrative and personalized approach in AD studies.
Genetics
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the molecular and phenotypic complexity of Alzheimer's disease (AD), especially in gene expression studies among different ethnic groups. Specifically, the paper aims to generate bulk RNA - sequencing data of the prefrontal cortex by analyzing 565 brain samples (including non - Hispanic whites, Hispanics, and African - Americans) from multiple American brain banks, in order to identify differentially expressed genes (DEG) across ancestries and ancestry - specific ones. Factors such as sex, age at death, and RNA quality indicators were adjusted in the study to ensure the accuracy of the results. In addition, the paper also attempts to solve the following key problems: 1. **Identification of differentially expressed genes across ancestries and ancestry - specific ones**: Researchers hope to identify differentially expressed genes that are common in different Braak stages and specific to certain ancestries by analyzing brain samples of AD patients with different ancestry backgrounds. 2. **Verification of findings**: External verification was carried out using brain samples from the Religious Orders Study / Memory and Aging Project (ROS/MAP), and a meta - analysis was performed to confirm the reliability and universality of the research results. 3. **Functional enrichment analysis**: Through gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA), the functions of differentially expressed genes and the biological pathways involved were explored, especially those related to AD. 4. **Application of machine - learning methods**: Machine - learning methods were used for phenotypic prediction and gene prioritization to construct a poly - transcriptomic risk score (PTRS), in order to improve the accuracy of AD diagnosis. 5. **Validation of single - nucleus RNA - sequencing data**: Single - nucleus RNA - sequencing (snRNAseq) data were used to validate the prioritized genes, further confirming the expression changes of these genes at the cellular level. In summary, through large - scale, multi - ethnic transcriptomics research, this paper aims to reveal the molecular mechanisms of AD, especially the differences among different ethnic groups, thereby promoting the development of more comprehensive and personalized research and treatment strategies for AD.