Application of topic models to a compendium of ChIP-Seq datasets uncovers recurrent transcriptional regulatory modules

Guodong Yang,Aiqun Ma,Zhaohui S. Qin,Li Chen
DOI: https://doi.org/10.1093/bioinformatics/btz975
IF: 5.8
2020-01-01
Bioinformatics
Abstract:Motivation: The availability of thousands of genome-wide coupling chromatin immunoprecipitation (ChIP)-Seq datasets across hundreds of transcription factors (TFs) and cell lines provides an unprecedented opportunity to jointly analyze large-scale TF-binding in vivo, making possible the discovery of the potential interaction and cooperation among different TFs. The interacted and cooperated TFs can potentially form a transcriptional regulatory module (TRM) (e.g. co-binding TFs), which helps decipher the combinatorial regulatory mechanisms. Results: We develop a computational method tfLDA to apply state-of-the-art topic models to multiple ChIP-Seq datasets to decipher the combinatorial binding events of multiple TFs. tfLDA is able to learn high-order combinatorial binding patterns of TFs from multiple ChIP-Seq profiles, interpret and visualize the combinatorial patterns. We apply the tfLDA to two cell lines with a rich collection of TFs and identify combinatorial binding patterns that show well-known TRMs and related TF co-binding events.
What problem does this paper attempt to address?