TFEA.ChIP: A tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets

Laura Puente-Santamaria,Luis del Peso
DOI: https://doi.org/10.1101/303651
2018-04-18
Abstract:Abstract The identification of transcription factors (TFs) responsible for the co-regulation of specific sets of genes is a common problem in transcriptomics. Herein we describe TFEA.ChIP, a tool to estimate and visualize TF enrichment in gene lists representing transcriptional profiles. To generate the gene sets representing TF targets, we gathered ChIP-Seq experiments from the ENCODE Consortium and GEO datasets and used the correlation between Dnase Hypersensitive Sites across cell lines to generate a database linking TFs with the genes they interact with in each ChIP-Seq experiment. In its current state, TFEA.ChIP covers 327 different transcription factors from 1075 ChIP-Seq experiments, with over 150 cell types being represented. TFEA.ChIP accepts gene sets as well as sorted lists differentially expressed genes to compute enrichment scores for each of the datasets in its internal database using an Fisher’s exact association test or a Gene Set Enrichment Analysis. We validated TFEA.ChIP using a wide variety of gene sets representing signatures of genetic and chemical perturbations as input and found that the relevant TF was correctly identified in 103 of a total of 144 analyzed datasets with a median area under the curve (AUC) of 0.86. In depth analysis of an RNAseq dataset, illustrates that the use of ChIP-Seq data instead of PWM-based provides key biological context to interpret the results of the analysis. To facilitate its integration into transcriptome analysis pipelines and allow easy expansion and customization of the TF-gene database, we implemented TFEA.ChIP as an R package that can be downloaded from Bioconductor: https://www.bioconductor.org/packages/devel/bioc/html/TFEA.ChIP.html and github: https://github.com/LauraPS1/TFEA-drafts In addition, make it available to a wide range of researches, we have also developed a web application that runs the package from the server side and enables easy exploratory analysis through interactive graphs: https://www.iib.uam.es/TFEA.ChIP/
What problem does this paper attempt to address?