Abstract:Models that predict RNA levels from DNA sequences show tremendous promise for decoding tissue-specific gene regulatory mechanisms, revealing the genetic architecture of traits, and interpreting noncoding genetic variation. Existing methods take two different approaches: 1) associating expression with linear combinations of common genetic variants (training across individuals on single genes), or 2) learning genome-wide sequence-to-expression rules with neural networks (training across loci using a reference genome). Since limitations of both strategies have been highlighted recently, we sought to combine the sequence context provided by deep learning with the information provided by cross-individual training. We utilized fine-tuning to develop Performer, a model with accuracy approaching the cis-heritability of most genes. Performer prioritizes genetic variants across the allele frequency spectrum that disrupt motifs, fall in annotated regulatory elements, and have functional evidence for modulating gene expression. While obstacles remain in personalized expression prediction, our findings establish deep learning as a viable strategy.

What problem does this paper attempt to address?

The main goal of this paper is to address the problem of predicting gene expression from individual genomic data using deep learning methods and to attempt to overcome some of the limitations of existing methods. Specifically, the paper addresses the following issues: 1. **Combining the advantages of two methods**: Existing methods have two main approaches to predict gene expression: one is through association analysis, performing a linear combination of common genetic variations of a single gene across individuals; the other is using neural networks to learn the rules from the entire genome sequence to expression. Each of these methods has its pros and cons. The paper aims to combine the strengths of both, utilizing the sequence context information provided by deep learning and the information provided by cross-individual training. 2. **Improving the limitations of existing deep learning models**: Current deep learning models have shortcomings in explaining the differences in gene expression between individuals, especially in predicting the direction of expression quantitative trait loci (eQTL). Additionally, these models often fail to reliably explain expression variation among different individuals. 3. **Developing a new deep learning model—Performer**: To overcome the aforementioned limitations, the researchers developed a new model called Performer. This model employs a fine-tuning strategy to achieve cross-individual training, thereby improving model performance. The Performer model can better capture the cis-heritability of gene expression and prioritize genetic variations that affect gene expression. 4. **Evaluating the performance of the new model**: Through experiments on a large number of samples from the GTEx dataset, the paper demonstrates that the Performer model outperforms existing deep learning models and linear models in predicting individual gene expression. Performer not only explains more of the heritability of expression but also correctly predicts the direction of the impact of genetic variations on gene expression. In summary, the goal of this paper is to develop and evaluate a new deep learning model—Performer, to address the limitations of existing models in predicting gene expression from individual genomic data.

Deep-learning prediction of gene expression from personal genomes

Enhancing Personalized Gene Expression Prediction From DNA Sequences Using Genomic Foundation Models

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings

Deep Learning Prediction of Ribosome Profiling with Translatomer Reveals Translational Regulation and Interprets Disease Variants

Effective gene expression prediction from sequence by integrating long-range interactions

A deep auto-encoder model for gene expression prediction

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations

Gene Expression Prediction based on Deep Learning

Training deep learning models on personalized genomic sequences improves variant effect prediction

EPInformer: A Scalable Deep Learning Framework for Gene Expression Prediction by Integrating Promoter-enhancer Sequences with Multimodal Epigenomic Data

Deep learning approaches for non-coding genetic variant effect prediction: current progress and future prospects

Fine-tuning sequence-to-expression models on personal genome and transcriptome data

Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data

Predicting cell type-specific epigenomic profiles accounting for distal genetic effects

Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network

Prediction of the cell-type-specific transcription of non-coding RNAs from genome sequences via machine learning

Learning the Regulatory Code of Gene Expression

Predictive Modeling of Gene Expression and Localization of DNA Binding Site Using Deep Convolutional Neural Networks

Deep Learning to Analyze RNA-Seq Gene Expression Data

A deep generative model for single-cell RNA sequencing with application to detecting differentially expressed genes