GenomicLinks: Deep learning predictions of 3D chromatin loops in the maize genome

Luca Schlegel,Rohan Bhardwaj,Yadollah Shahryary,Defne Demirtürk,Alexandre P. Marand,Robert J. Schmitz,Frank Johannes
DOI: https://doi.org/10.1101/2024.05.06.592633
2024-05-08
Abstract:Gene regulation in eukaryotes is partly shaped by the 3D organization of chro]matin within the cell nucleus. Distal interactions between -regulatory elements and their target genes are widespread and many causal loci underlying heritable agricultural traits have been mapped to distal non-coding elements. The biology underlying chromatin loop formation in plants is poorly understood. Dissecting the sequence features that mediate distal interactions is an important step toward identifying putative molecular mechanisms. Here, we trained GenomicLinks, a deep learning model, to identify DNA sequence features predictive of 3D chromatin interactions in maize. We found that the presence of binding motifs of specific Transcription Factor classes, especially bHLH, are predictive of chromatin interaction specificities. Using an mutagenesis approach we show the removal of these motifs from loop anchors leads to reduced interaction probabilities. We were able to validate these predictions with single-cell co-accessibility data from different maize genotypes that harbor natural substitutions in these TF binding motifs. GenomicLinks is currently implemented as an open-source web tool, which should facilitate its wider use in the plant research community.
Genomics
What problem does this paper attempt to address?
The paper primarily focuses on predicting the 3D chromatin looping structures in plant genomes. The research team developed a deep learning model called GenomicLinks, which is able to predict chromatin interactions from the DNA sequence of the maize genome. In animals, the formation of chromatin loops is mainly mediated by the CTCF-cohesin complex, but in plants, the mechanism is not clear due to the lack of CTCF protein. The study found that the specificity of chromatin interactions is associated with binding sites of certain transcription factor (TF) categories, especially the bHLH class. By simulating mutation methods, they demonstrated that the loss of these sites leads to a decrease in interaction probability. These predictions were validated using single-cell accessibility data from different maize genotypes with naturally occurring TF binding site variations. GenomicLinks has been released as an open-source web tool for easy access by plant researchers. This model combines convolutional neural networks (CNN) and long short-term memory networks (LSTM) to identify spatial and sequence features in DNA sequences for predicting chromatin interactions. Through training on Hi-ChIP data, GenomicLinks achieves high prediction accuracy on maize, aiding in the understanding of molecular mechanisms of chromatin organization in plants and providing potential clues for functional verification and breeding goals.