Abstract:The identification of transcription factor binding sites (TFBSs) is crucial for understanding the regulatory mechanisms of gene expression, which contributes to unraveling cellular functions and disease development. Currently, the most common approach involves the use of deep learning techniques to predict TFBSs by combining sequence and shape features. Although significant progress has been made with these methods, the integration of local features extracted from DNA sequences and shapes with global features has not yet reached a sufficient level, and there is still significant room for improvement in the accuracy of prediction results. In this paper, we propose a novel framework based on convolution and attention mechanisms, referred to as TBCA, which combines DNA sequence information and shape information for predicting transcription factor binding sites. In this work, we employ a two-layer convolutional neural network (CNNs) and self-attention mechanism to extract complex sequence features from DNA. What's more, we utilize a Fourier-transform-enhanced multi-head attention along with channel attention to extract high-order shape features of DNA. Finally, these high-order sequence and shape features are integrated into the channel dimension to achieve accurate TFBSs prediction. Our research results demonstrate that TBCA exhibits superior predictive performance in 165 validated ChIP-seq datasets. Furthermore, the employed attention mechanisms can automatically learn important features at different positions and scales, enhancing the accuracy and robustness of feature representation. We also conduct an in-depth analysis of the contributions of five different shapes to site prediction, revealing that shape features can enhance the prediction of transcription factor DNA binding.

Predicting Transcription Factor Binding Sites by a Multi-Modal Representation Learning Method Based on Cross-Attention Network

Prediction of Transcription Factor Binding Sites with an Attention Augmented Convolutional Neural Network

Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach

MulTFBS: A Spatial-Temporal Network with Multichannels for Predicting Transcription Factor Binding Sites

Using Deep Learning to Predict Transcription Factor Binding Sites Based on Multiple-omics Data

Deeptf: Accurate Prediction Of Transcription Factor Binding Sites By Combining Multi-Scale Convolution And Long Short-Term Memory Neural Network

High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method

Cooperation of local features and global representations by a dual-branch network for transcription factor binding sites prediction

MLSNet: a deep learning model for predicting transcription factor binding sites

By hybrid neural networks for prediction and interpretation of transcription factor binding sites based on multi-omics

TBCA: Prediction of transcription factor binding sites using a deep neural network with lightweight attention mechanism

Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture

BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning

Predicting Transcription Factor Binding Sites with Deep Learning

A Novel Convolution Attention Model for Predicting Transcription Factor Binding Sites by Combination of Sequence and Shape.

MAResNet: predicting transcription factor binding sites by combining multi-scale bottom-up and top-down attention and residual network

DeepARC: An Attention-based Hybrid Model for Predicting Transcription Factor Binding Sites from Positional Embedded DNA Sequence

DeepSTF: predicting transcription factor binding sites by interpretable deep neural networks combining sequence and shape

Transfer learning and DNA language models enhance transcription factor binding predictions

Enhancing the interpretability of transcription factor binding site prediction using attention mechanism

Multiomics-integrated Deep Language Model Enables in Silico Genome-Wide Detection of Transcription Factor Binding Site in Unexplored Biosamples