Computer Analysis of Colocalization of the TFs’ Binding Sites in the Genome According to the ChIP-seq Data
A. I. Dergilev,A. M. Spitsina,I. V. Chadaeva,A. V. Svichkarev,F. M. Naumenko,E. V. Kulakova,E. R. Galieva,E. E. Vityaev,M. Chen,Yu. L. Orlov
DOI: https://doi.org/10.1134/s2079059717050057
2017-01-01
Russian Journal of Genetics Applied Research
Abstract:A computer program for calculating clusters of binding sites of various transcription factors (TFs) according to the genomic coordinates of the ChIP-seq (Chromatin ImmunoPrecipitation-sequencing) profile peaks is developed. The statistical features of the distribution of the transcription factors’ binding sites (TFBSs) in the mouse genome, obtained with the help of ChIP-seq experiments in embryonic stem cells, are considered. Clusters of sites containing at least four binding sites of various TFs in the mouse genome are determined and their localization relative to the regulatory regions of the genes is described. Two types of colocalization of the sites are confirmed: clusters containing binding sites of factors Oct4, Nanog, and Sox2 located in the distal regions and clusters with n-Myc and c-Myc binding sites located mainly in the promoter regions of mouse genes. Analysis of the new ChIP-seq data on the binding of TFs Nr5a2, Tbx3, Cep, SRF, and USF1 in the same cell type confirmed the differentiation of clusters of the TFBSs into two types: those containing pluripotency regulator binding sites (Oct4, Nanog, and Sox2) and those not containing them. A computer program for the statistical processing of the data on the location of the sites in the genes is developed; it uses the experimental data on site localization obtained by ChIP-seq methods in mouse and human genomes. With the help of this program, the localization patterns of the binding sites of various TFs are detected. The distances between the closest binding sites of the TF groups Oct4, Nanog, and Sox2 and the binding sites of other factors in site clusters that serve as a basis for the analysis of the joint binding of protein complexes to DNA are calculated. The fraction of the presence of the known nucleotide motifs of TFBSs in the genomic regions of ChIP-seq is calculated. The weight matrices for such nucleotide motifs are recalculated. The correlation between the presence of motifs and the ChIP-seq binding intensity is shown. The programs implementing the computerized methods for assessing the clustering of binding sites of various TFs for new ChIP-seq data are available upon request from the authors.