Using Deep Learning to Predict Transcription Factor Binding Sites Based on Multiple-omics Data

Youhong Xu,Changan Yuan,Hongjie Wu,Xingming Zhao
DOI: https://doi.org/10.1007/978-3-031-13870-6_65
2022-01-01
Abstract:Transcription factors (TFs) have a great effect on gene transcription process. TFs can boost the formation of complex gene expression regulation system by promoting or inhibiting gene binding to DNA, which is called as TF binding sites (TFBSs). Recent years have seen the rapid development deep learning (DL) method in natural language processing (NLP), computer vision (CV) and these methods outperform than the state-of-the-art method. Many scholars applied these methods to motif discovery, e.g., DeepBind and DenQ. But these methods only use the raw DNA sequence as input data. Instead of improving complex model, massive biological data brought by high-throughput sequencing technology provides a different idea. In this paper, we propose a simple and effective DL-based model, namely DeepCR, integrating multiple-omics data to predict TFBSs. Experiments on 21 motif datasets of GM12878 cell line from in-vitro protein binding microarray data show that multiple-omics data can significantly improve the overall performance. More specifically, the average AUC is improved by 3.89% for histone modifications, and 3.77% for MeDIP-seq respectively, and 6.63% for histone modifications and MeDIP-seq together. And the mean AR is increased by 3.90% for histone modifications, and 4.50% for MeDIP-seq respectively, and 6.00% for histone modifications and MeDIP-seq together.
What problem does this paper attempt to address?