Leveraging genomic large language models to enhance causal genotype-brain-clinical pathways in Alzheimer's disease

Qiao Liu,Wanwen Zeng,hongtu zhu,Lexin Li,Wing Hung Wong
DOI: https://doi.org/10.1101/2024.10.03.24314824
2024-10-22
Abstract:Genome-wide association studies (GWAS) have identified numerous Alzheimer's disease (AD)-associated variants. However, how these variants contribute to the etiology of AD remains largely elusive. Recent advances in genomic large language models (LLMs) offer new opportunities to interpret the genetic variation observed in personal genome. In this study, we propose epiBrainLLM, a novel computational framework that leverages genomic LLM to enhance our understanding of the causal pathways from genotypes to brain measures to AD-related clinical phenotypes. epiBrainLLM will first convert the personal DNA sequence into a diverse set of genomic and epigenomic features using a pretrained genomic LLM and then use these features to further predict phenotypes. Across various experimental settings, epiBrainLLM significantly improves causal analysis compared to the traditional genotype association approach. We conclude that epiBrainLLM provides a novel perspective for understanding the regulatory mechanisms underlying the AD disease etiology, potentially offering insights into complex disease mechanisms beyond AD.
What problem does this paper attempt to address?