Predicting transcription factor binding motifs from DNA-binding domains, chromatin accessibility and gene expression data

Mahdi Zamanighomi,Zhixiang Lin,Yong Wang,Rui Jiang,Wing Hung Wong
DOI: https://doi.org/10.1093/nar/gkx358
IF: 14.9
2017-05-03
Nucleic Acids Research
Abstract:Transcription factors (TFs) play crucial roles in regulating gene expression through interactions with specific DNA sequences. Recently, the sequence motif of almost 400 human TFs have been identified using high-throughput SELEX sequencing. However, there remain a large number of TFs (∼800) with no high-throughput-derived binding motifs. Computational methods capable of associating known motifs to such TFs will avoid tremendous experimental efforts and enable deeper understanding of transcriptional regulatory functions. We present a method to associate known motifs to TFs (MATLAB code is available in Supplementary Materials). Our method is based on a probabilistic framework that not only exploits DNA-binding domains and specificities, but also integrates open chromatin, gene expression and genomic data to accurately infer monomeric and homodimeric binding motifs. Our analysis resulted in the assignment of motifs to 200 TFs with no SELEX-derived motifs, roughly a 50% increase compared to the existing coverage.
biochemistry & molecular biology
What problem does this paper attempt to address?