Computational prediction and characterization of cell-type-specific and shared binding sites
Qinhu Zhang,Pengrui Teng,Siguo Wang,Ying He,Zhen Cui,Zhenghao Guo,Yixin Liu,Changan Yuan,Qi Liu,De-Shuang Huang
DOI: https://doi.org/10.1093/bioinformatics/btac798
IF: 5.8
2022-12-09
Bioinformatics
Abstract:Abstract Motivation Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF’s intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. Results In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. Availability and implementation The source code is available at: https://github.com/turningpoint1988/CSSBS. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology