DeepARC: An Attention-based Hybrid Model for Predicting Transcription Factor Binding Sites from Positional Embedded DNA Sequence

Jialong Chen,Lei Deng
DOI: https://doi.org/10.1109/BIBM49941.2020.9313249
2020-01-01
Abstract:The binding of transcription factors (TFs) to transcription factor binding sites (TFBS) plays a pivotal role in regulating gene expression and evolution. Accurately modeling the specificity of DNA and searching for TFBS helps understand the genome's function and evolution. In recent years, computational identification of TFBS has become an active field of research. Here, we propose DeepARC, an attention-based hybrid approach combining convolutional neural network (CNN) and recurrent neural network (RNN) for predicting TFBS. We employ a position-based embedding strategy to embed a DNA sequence into a matrix with distributed representation contenting the position information and then feed the distributed representations of the sequence into a CNN-BiLSTM-Attention-based framework to classify whether there is a TFBS in a sequence. Take the advantage of the attention mechanism, DeepARC can obtain more valuable information about TFBS and add interpretability to the TFBS search process. Moreover, sufficient experiments prove that DeepARC has better performance than existing predictors. The DeepARC web server is available at http://deeparc.denglab.org.
What problem does this paper attempt to address?