Interpreting -Regulatory Interactions from Large-Scale Deep Neural Networks for Genomics

Shushan Toneyan,Peter K Koo
DOI: https://doi.org/10.1101/2023.07.03.547592
2024-03-21
Abstract:The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with experimental perturbation assays, which provides insights into the generalization capabilities within the studied loci but offers a limited perspective of what drives their predictions. Moreover, existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we introduce CREME, an perturbation toolkit that interrogates large-scale DNNs to uncover rules of gene regulation that it learns. Using CREME, we investigate Enformer, a prominent DNN in gene expression prediction, revealing -regulatory elements (CREs) that directly enhance or silence target genes. We explore the intricate complexity of higher-order CRE interactions, the relationship between CRE distance from transcription start sites on gene expression, as well as the biochemical features of enhancers and silencers learned by Enformer. Moreover, we demonstrate the flexibility of CREME to efficiently uncover a higher-resolution view of functional sequence elements within CREs. This work demonstrates how CREME can be employed to translate the powerful predictions of large-scale DNNs to study open questions in gene regulation.
Genomics
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on how to better understand and interpret the ability of large - scale deep neural networks (DNNs) to predict gene expression in genomics. Specifically, the paper proposes a tool named CREME (Cis - Regulatory Element Model Explanations), aiming to explore the gene regulation rules learned by large DNNs through computer simulation experiments. These problems can be further subdivided into the following aspects: 1. **Evaluating and Interpreting DNN Predictions**: Current methods for evaluating DNN predictions mainly compare the prediction results with those of experimental perturbation analyses (such as massively parallel reporter assays and CRISPR interference techniques). Although this can provide information on the generalization ability of the model within specific regions, there is limited understanding of the specific mechanisms driving these predictions. Moreover, existing model interpretability tools mainly focus on motif analysis, which becomes complicated when dealing with longer sequences. 2. **Revealing the Role of cis - Regulatory Elements (CREs)**: Through the CREME tool, researchers hope to gain in - depth understanding of how CREs directly enhance or suppress the expression of target genes and the complexity of higher - order CRE interactions. 3. **Exploring the Relationship between CREs and the Transcription Start Site (TSS) Distance**: Researchers hope to explore through CREME how the distance between CREs and TSS affects gene expression. 4. **Analyzing the Biochemical Characteristics of Enhancers and Repressors**: Researchers also hope to understand the biochemical properties of enhancers and repressors learned by DNNs through the CREME tool, including their epigenetic features and transcription factor binding situations. 5. **Improving the Resolution of Functional Sequence Elements**: Finally, researchers hope that the CREME tool can provide an efficient method to identify functional sequence elements within CREs at a higher resolution, thereby achieving a more in - depth understanding of the mechanisms of gene regulation. Overall, by introducing the CREME tool, this paper aims to overcome the limitations of existing methods in evaluating and interpreting large - scale DNNs' prediction of gene expression, providing new perspectives and tools for gene regulation research.