DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding

Yongqing Zhang,Shaojie Qiao,Shengjie Ji,Yizhou Li
DOI: https://doi.org/10.1007/s13042-019-00990-x
2019-07-29
International Journal of Machine Learning and Cybernetics
Abstract:Transcription factors are <i>cis</i>-regulatory molecules that bind to specific sub-regions of DNA promoters and initiate transcription, the process that regulates the conversion of genetic information from DNA to RNA. Several computational methods have been developed to predict DNA–protein binding sites in DNA sequence using convolutional neural network (CNN). However, these techniques could indicate the dependency information of DNA sequence information in the framework of CNN. In addition, these methods are not accurate enough in prediction of the DNA–protein binding sites from the DNA sequence. In this study, we employ the bidirectional long short-term memory (BLSTM) and CNN to capture long-term dependencies between the sequence motifs in DNA, which is called DeepSite. Apart from traditional CNN, which includes six layers: input layer, BLSTM layer, CNN layer, pooling layer, full connection layer and output layer, DeepSite approach can predict DNA–protein binding sites with 87.12% sensitivity, 91.06% specificity, 89.19% accuracy and 0.783 <i>MCC</i>, when tested on the 690 Chip-seq experiments from ENCODE. Lastly, we conclude that our proposed method can also be applied to find DNA–protein binding sites in different DNA sequences.
computer science, artificial intelligence
What problem does this paper attempt to address?