Identification of Plant Transcription Factor DNA-Binding Sites Using seq-DAP-seq

Stephanie Hutin,Romain Blanc-Mathieu,Philippe Rieu,François Parcy,Xuelei Lai,Chloe Zubieta
DOI: https://doi.org/10.1007/978-1-0716-3354-0_9
Abstract:The identification of genome-wide transcription factor binding sites (TFBS) is a critical step in deciphering gene and transcriptional regulatory networks. However, determining the genome-wide binding of specific TFs or TF complexes remains a technical challenge. DNA affinity purification sequencing (DAP-seq) and modifications such as sequential DAP-seq (seq-DAP-seq) are robust in vitro methods for mapping individual TF or TF complex binding sites in a genome-wide manner. DAP-seq protocols use a genomic DNA (gDNA) library from any target organism with or without amplification, allowing the determination of TF binding on naked or endogenously modified DNA, respectively. As a first step, the gDNA is fragmented to ~200 bp, end-repaired, and sequencing adaptors are added. This gDNA library can be used directly or an amplification step may be performed to remove DNA modifications such as cytosine methylation. DNA libraries are then incubated with an affinity-tagged TF or TF- complex immobilized on magnetic beads. The TF or TF complex of interest is generally produced using recombinant protein expression and purified prior to DNA affinity purification. After incubation of the DNA library with the immobilized TF of interest, multiple wash steps are performed to reduce non-specific DNA binding and the TF-DNA complexes eluted. The eluted DNA is PCR-amplified and sequenced using next-generation sequencing. The resulting sequence reads are mapped to the corresponding reference genome, identifying direct potential bound regions and binding sites of the TF or TF complex of interest. Predictive TFBS models are generated from the bound regions using downstream bioinformatics analysis pipelines. Here, we present a detailed protocol outlining the steps required for seq-DAP-seq of a heterooligomeric TF complex (Fig. 1) and briefly describe the downstream bioinformatics pipeline used to develop a robust TFBS model from sequencing data generated from a DAP-seq experiment.
What problem does this paper attempt to address?