Abstract:Abstract Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k -mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.

What problem does this paper attempt to address?

This paper aims to solve the key problems in the protein - DNA binding mechanism, especially the influence of DNA shape on these binding mechanisms. Specifically, the research focuses on developing a deep - learning - based method - Deep DNAshape - to predict DNA structural features in a high - throughput manner. This method significantly improves the current high - throughput prediction methods based on k - mer by considering the influence of the extended flanking regions without the need for a large number of molecular simulations or structural biology experiments. The following are the specific problems that this paper attempts to solve: 1. **Improve the accuracy of DNA shape prediction**: Existing methods such as DNAshape rely on pentamer lookup tables, which can only consider the influence of the nearest and next - nearest neighbors and ignore the influence of the sequence environment in a more distant range. Deep DNAshape overcomes this limitation through a deep - learning model and can predict DNA shape features more accurately, especially when considering the influence of the extended flanking regions. 2. **Understand the influence of flanking regions on the core DNA structure**: The paper explores how flanking regions affect the core DNA structure by high - throughput prediction of DNA shape features. This helps to gain a deeper understanding of the detailed structural read - out mechanism of protein - DNA binding, especially for transcription factors (TFs) with long core motifs. 3. **Improve the prediction accuracy of machine - learning models**: The research found that when the features generated by Deep DNAshape are incorporated into machine - learning models, the prediction accuracy of the models for TF - DNA binding specificity can be significantly improved. This provides new tools and methods for the study of gene regulation mechanisms. 4. **Predict DNA shape fluctuations**: In addition to static DNA shape features, Deep DNAshape can also predict DNA shape fluctuations, which helps to understand the conformational flexibility of DNA molecules and their influence on protein binding. In summary, the main objective of this paper is to provide an efficient and accurate tool for predicting DNA shape features and their fluctuations by developing the Deep DNAshape method, so as to better understand the protein - DNA binding mechanism and improve the prediction ability of relevant machine - learning models.

Predicting DNA structure using a deep learning method

Deep DNAshape webserver: prediction and real-time visualization of DNA shape considering extended k -mers

Exploring Protein-DNA Binding Residue Prediction and Consistent Interpretability Analysis Using Deep Learning

Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling

Evaluation of DNA-protein complex structures using the deep learning method

Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding

Deep learning approach for predicting functional Z-DNA regions using omics data

Protein-DNA Binding Residues Prediction Using a Deep Learning Model with Hierarchical Feature Extraction

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

Geometric deep learning of protein-DNA binding specificity

DeepSTF: predicting transcription factor binding sites by interpretable deep neural networks combining sequence and shape

Predicting RNA sequence-structure likelihood via structure-aware deep learning

Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding

Prediction of DNA origami shape using graph neural network

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Predicting DNA Reactions with a Quantum Chemistry-Based Deep Learning Model

Improving the prediction of DNA-protein binding by integrating multi-scale dense convolutional network with fault-tolerant coding

DeepPGD: A Deep Learning Model for DNA Methylation Prediction Using Temporal Convolution, BiLSTM, and Attention Mechanism

Predicting the sequence specificities of DNA-binding proteins by DNA Fine-tuned Language Model with decaying learning rates

Machine and deep learning methods for predicting 3D genome organization

DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding