Dissecting the cis-regulatory syntax of transcription initiation with deep learning

Kelly Cochran,Melody Yin,Anika Mantripragada,Jacob Schreiber,Georgi K Marinov,Anshul Kundaje
DOI: https://doi.org/10.1101/2024.05.28.596138
2024-06-01
Abstract:Despite extensive characterization of mammalian Pol II transcription, the DNA sequence determinants of transcription initiation at a third of human promoters and most enhancers remain poorly understood. Hence, we trained and interpreted a neural network called ProCapNet that accurately models base-resolution initiation profiles from PRO-cap experiments using local DNA sequence. ProCapNet learns sequence motifs with distinct effects on initiation rates and TSS positioning and uncovers context-specific cryptic initiator elements intertwined within other TF motifs. ProCapNet annotates predictive motifs in nearly all actively transcribed regulatory elements across multiple cell-lines, revealing a shared cis-regulatory logic across promoters and enhancers mediated by a highly epistatic sequence syntax of cooperative and competitive motif interactions. ProCapNet models of RAMPAGE profiles measuring steady-state RNA abundance at TSSs distill initiation signals on par with models trained directly on PRO-cap profiles. ProCapNet learns a largely cell-type-agnostic cis-regulatory code of initiation complementing sequence drivers of cell-type-specific chromatin state critical for accurate prediction of cell-type-specific transcription initiation.
Genomics
What problem does this paper attempt to address?