Interpreting -Regulatory Interactions from Large-Scale Deep Neural Networks for Genomics

Shushan Toneyan,Peter K Koo

DOI: https://doi.org/10.1101/2023.07.03.547592

2024-03-21

Abstract:The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with experimental perturbation assays, which provides insights into the generalization capabilities within the studied loci but offers a limited perspective of what drives their predictions. Moreover, existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we introduce CREME, an perturbation toolkit that interrogates large-scale DNNs to uncover rules of gene regulation that it learns. Using CREME, we investigate Enformer, a prominent DNN in gene expression prediction, revealing -regulatory elements (CREs) that directly enhance or silence target genes. We explore the intricate complexity of higher-order CRE interactions, the relationship between CRE distance from transcription start sites on gene expression, as well as the biochemical features of enhancers and silencers learned by Enformer. Moreover, we demonstrate the flexibility of CREME to efficiently uncover a higher-resolution view of functional sequence elements within CREs. This work demonstrates how CREME can be employed to translate the powerful predictions of large-scale DNNs to study open questions in gene regulation.

Genomics

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on how to better understand and interpret the ability of large - scale deep neural networks (DNNs) to predict gene expression in genomics. Specifically, the paper proposes a tool named CREME (Cis - Regulatory Element Model Explanations), aiming to explore the gene regulation rules learned by large DNNs through computer simulation experiments. These problems can be further subdivided into the following aspects: 1. **Evaluating and Interpreting DNN Predictions**: Current methods for evaluating DNN predictions mainly compare the prediction results with those of experimental perturbation analyses (such as massively parallel reporter assays and CRISPR interference techniques). Although this can provide information on the generalization ability of the model within specific regions, there is limited understanding of the specific mechanisms driving these predictions. Moreover, existing model interpretability tools mainly focus on motif analysis, which becomes complicated when dealing with longer sequences. 2. **Revealing the Role of cis - Regulatory Elements (CREs)**: Through the CREME tool, researchers hope to gain in - depth understanding of how CREs directly enhance or suppress the expression of target genes and the complexity of higher - order CRE interactions. 3. **Exploring the Relationship between CREs and the Transcription Start Site (TSS) Distance**: Researchers hope to explore through CREME how the distance between CREs and TSS affects gene expression. 4. **Analyzing the Biochemical Characteristics of Enhancers and Repressors**: Researchers also hope to understand the biochemical properties of enhancers and repressors learned by DNNs through the CREME tool, including their epigenetic features and transcription factor binding situations. 5. **Improving the Resolution of Functional Sequence Elements**: Finally, researchers hope that the CREME tool can provide an efficient method to identify functional sequence elements within CREs at a higher resolution, thereby achieving a more in - depth understanding of the mechanisms of gene regulation. Overall, by introducing the CREME tool, this paper aims to overcome the limitations of existing methods in evaluating and interpreting large - scale DNNs' prediction of gene expression, providing new perspectives and tools for gene regulation research.

Interpreting -Regulatory Interactions from Large-Scale Deep Neural Networks for Genomics

Interpreting cis-regulatory interactions from large-scale deep neural networks

Deciphering gene regulation from gene expression dynamics using deep neural network

A mechanistically interpretable neural network for regulatory genomics

EPInformer: A Scalable Deep Learning Framework for Gene Expression Prediction by Integrating Promoter-enhancer Sequences with Multimodal Epigenomic Data

Identification, Design, and Application of Noncoding Cis-Regulatory Elements

Interpreting -regulatory mechanisms from genomic deep neural networks using surrogate models

Multiomic foundation model predicts epigenetic regulation by zero-shot

CREATE: cell-type-specific cis-regulatory elements identification via discrete embedding

Biologically Informed Deep Learning to Infer Gene Program Activity in Single Cells

DeepRegFinder: deep learning-based regulatory elements finder

Interpreting cis -regulatory mechanisms from genomic deep neural networks using surrogate models

Probabilistic association of differentially expressed genes with cis -regulatory elements

DIRECT-NET: an Efficient Method to Discover Cis-Regulatory Elements and Construct Regulatory Networks from Single-Cell Multiomics Data

Active learning of enhancer and silencer regulatory grammar in a developing neural tissue

Comprehensive network modeling approaches unravel dynamic enhancer-promoter interactions across neural differentiation

Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network

MCNET: Multi-Omics Integration for Gene Regulatory Network Inference from scRNA-seq

Evaluation and optimization of sequence-based gene regulatory deep learning models

Functional Dissection of Regulatory Models Using Gene Expression Data of Deletion Mutants.

Effective gene expression prediction from sequence by integrating long-range interactions