CCIP: Predicting CTCF-mediated Chromatin Loops with Transitivity

Weibing Wang,Lin Gao,Yusen Ye,Yong Gao
DOI: https://doi.org/10.1093/bioinformatics/btab534
IF: 5.8
2021-01-01
Bioinformatics
Abstract:Abstract Motivation CTCF-mediated chromatin loops underlie the formation of topological associating domains and serve as the structural basis for transcriptional regulation. However, the formation mechanism of these loops remains unclear, and the genome-wide mapping of these loops is costly and difficult. Motivated by the recent studies on the formation mechanism of CTCF-mediated loops, we studied the possibility of making use of transitivity-related information of interacting CTCF anchors to predict CTCF loops computationally. In this context, transitivity arises when two CTCF anchors interact with the same third anchor by the loop extrusion mechanism and bring themselves close to each other spatially to form an indirect loop. Results To determine whether transitivity is informative for predicting CTCF loops and to obtain an accurate and low-cost predicting method, we proposed a two-stage random-forest-based machine learning method, CTCF-mediated Chromatin Interaction Prediction (CCIP), to predict CTCF-mediated chromatin loops. Our two-stage learning approach makes it possible for us to train a prediction model by taking advantage of transitivity-related information as well as functional genome data and genomic data. Experimental studies showed that our method predicts CTCF-mediated loops more accurately than other methods and that transitivity, when used as a properly defined attribute, is informative for predicting CTCF loops. Furthermore, we found that transitivity explains the formation of tandem CTCF loops and facilitates enhancer–promoter interactions. Our work contributes to the understanding of the formation mechanism and function of CTCF-mediated chromatin loops. Availability and implementation The source code of CCIP can be accessed at: https://github.com/GaoLabXDU/CCIP. Supplementary information Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?