LOL-EVE: Predicting Promoter Variant Effects from Evolutionary Sequences

Courtney Shearer,Felix Teufel,Rose Orenbuch,Daniel Ritter,Aviv Spinner,Erik Xie,Jonathan Frazer,Mafalda Dias,Pascal Notin,Debora Marks
DOI: https://doi.org/10.1101/2024.11.11.623015
2024-11-12
Abstract:Genetic studies reveal extensive disease-associated variation across the human genome, predominantly in noncoding regions, such as promoters. Quantifying the impact of these variants on disease risk is crucial to our understanding of the underlying disease mechanisms and advancing personalized medicine. However, current computational methods struggle to capture variant effects, particularly those of insertions and deletions (indels), which can significantly disrupt gene expression. To address this challenge, we present LOL-EVE (Language Of Life across EVolutionary Effects), a conditional autoregressive transformer model trained on 14.6 million diverse mammalian promoter sequences. Leveraging evolutionary information and proximal genetic context, LOL-EVE predicts indel variant effects in human promoter regions. We introduce three new benchmarks for indel variant effect prediction in promoter regions, comprising the identification of causal eQTLs, prioritization of rare variants in the human population, and understanding disruptions of transcription factor binding sites. We find that LOL-EVE achieves state-of-the-art performance on these tasks, demonstrating the potential of region-specific large genomic language models and offering a powerful tool for prioritizing potentially causal non-coding variants in disease studies.
Genomics
What problem does this paper attempt to address?