Identification of transcription factor co-binding patterns with non-negative matrix factorization

Ieva Rauluseviciute,Timothée Launay,Guido Barzaghi,Sarvesh Nikumbh,Boris Lenhard,Arnaud Regis Krebs,Jaime A Castro-Mondragon,Anthony Mathelier
DOI: https://doi.org/10.1093/nar/gkae743
IF: 14.9
2024-09-03
Nucleic Acids Research
Abstract:Transcription factor (TF) binding to DNA is critical to transcription regulation. Although the binding properties of numerous individual TFs are well-documented, a more detailed comprehension of how TFs interact cooperatively with DNA is required. We present COBIND, a novel method based on non-negative matrix factorization (NMF) to identify TF co-binding patterns automatically. COBIND applies NMF to one-hot encoded regions flanking known TF binding sites (TFBSs) to pinpoint enriched DNA patterns at fixed distances. We applied COBIND to 5699 TFBS datasets from UniBind for 401 TFs in seven species. The method uncovered already established co-binding patterns and new co-binding configurations not yet reported in the literature and inferred through motif similarity and protein-protein interaction knowledge. Our extensive analyses across species revealed that 67% of the TFs shared a co-binding motif with other TFs from the same structural family. The co-binding patterns captured by COBIND are likely functionally relevant as they harbor higher evolutionarily conservation than isolated TFBSs. Open chromatin data from matching human cell lines further supported the co-binding predictions. Finally, we used single-molecule footprinting data from mouse embryonic stem cells to confirm that the COBIND-predicted co-binding events associated with some TFs likely occurred on the same DNA molecules.
biochemistry & molecular biology
What problem does this paper attempt to address?