Abstract:Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.

Identification of Plant Transcription Factor DNA-Binding Sites Using seq-DAP-seq

DAP-Seq Identification of Transcription Factor-Binding Sites in Potato

Mapping genome-wide transcription-factor binding sites using DAP-seq

Novel High-Throughput Profiling of Human Transcription Factors and Its Use for Systematic Pathway Mapping

Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors

An Improved Method for Identifying Specific DNA-Protein-Binding Sites in Vitro.

Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems

Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites

Pinpointing Transcription Factor Binding Sites from ChIP-seq Data with SeqSite

Using DNase digestion data to accurately identify transcription factor binding sites.

A Microfluidics-Based Platform For Identification And Detailed Characterization Of Transcription Factor Binding Sites

Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity

Identifying transcription factor-DNA interactions using machine learning

KaScape: A Sequencing-Based Method for Global Characterization of Protein-Dna Binding Affinity

Multiplexed Massively Parallel SELEX for Characterization of Human Transcription Factor Binding Specificities.

A Streamlined and Generalized Analysis of Chromatin ImmunoPrecipitation Paired-End diTag Data

XL-DNase-Seq: Footprinting Analysis of Dynamic Transcription Factors

Biotinylated Tn5 Transposase‐mediated CUT&Tag Efficiently Profiles Transcription Factor‐DNA Interactions in Plants

Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants

The Next Generation of Transcription Factor Binding Site Prediction