T-Gene: Improved target gene prediction

Timothy O’Connor,Charles E. Grant,Mikael Bodén,Timothy L. Bailey
DOI: https://doi.org/10.1101/803221
2019-10-15
Abstract:Abstract Motivation Identifying the genes regulated by a given transcription factor (its “target genes”) is a key step in developing a comprehensive understanding of gene regulation. Previously we developed a method for predicting the target genes of a transcription factor (TF) based solely on the correlation between a histone modification at the TF’s binding site and the expression of the gene across a set of tissues. That approach is limited to organisms for which extensive histone and expression data is available, and does not explicitly incorporate the genomic distance between the TF and the gene. Results We present the T-Gene algorithm, which overcomes these limitations. T-Gene can be used to predict which genes are most likely to be regulated by a TF, and which of the TF’s binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene’s promoter, achieving median positive predictive value (PPV) above 50%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median PPV above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions. Availability The T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://meme-suite.org . Contact timothybailey@unr.edu
What problem does this paper attempt to address?