Abstract:Determining transcriptional factor binding sites (TFBSs) is critical for understanding the molecular mechanisms regulating gene expression in different biological conditions. Biological assays designed to directly mapping TFBSs require large sample size and intensive resources. As an alternative, ATAC-seq assay is simple to conduct and provides genomic cleavage profiles that contain rich information for imputing TFBSs indirectly. Previous footprint-based tools are inheritably limited by the accuracy of their bias correction algorithms and the efficiency of their feature extraction models. Here we introduce TAMC ( T ranscriptional factor binding prediction from A TAC-seq profile at M otif-predicted binding sites using C onvolutional neural networks), a deep-learning approach for predicting motif-centric TF binding activity from paired-end ATAC-seq data. TAMC does not require bias correction during signal processing. By leveraging a one-dimensional convolutional neural network (1D-CNN) model, TAMC make predictions based on both footprint and non-footprint features at binding sites for each TF and outperforms existing footprinting tools in TFBS prediction particularly for ATAC-seq data with limited sequencing depth. Applications of deep learning models are rapidly gaining popularity in recent biological studies because of their efficiency in analyzing non-linear patterns from feature-rich data. In this study, we developed a deep learning method to predict transcription factor binding sites based on chromatin accessibility profiles. Compared to previous methods using scoring functions and classical machine learning algorithms, our method forgoes the need for bias correction during signal processing and significantly increases the efficiency in extracting features at transcription factor binding sites. In addition, we showed that our method outperforms previous methods particularly for chromatin accessibility data with shallow sequencing depth. In this study, we applied our method to prediction of changes in binding sites of a transcription factor, CTCF, during early embryonic development based on bulk chromatin accessibility profiles. We then discussed about the potential application of our method to transcription factor binding site prediction using single-cell chromatin accessibility profiles as well as possible strategies to further improve the performance of our method in the future.

PTF-Vāc: Ab-initio discovery of plant transcription factors binding sites using explainable and generative deep co-learning encoders-decoders

PTFSpot: Deep co-learning on transcription factors and their binding regions attains impeccable universality in plants

Comprehensive analysis of computational approaches in plant transcription factors binding regions discovery

Multiomics-integrated Deep Language Model Enables in Silico Genome-Wide Detection of Transcription Factor Binding Site in Unexplored Biosamples

TSPTFBS: a Docker image for trans-species prediction of transcription factor binding sites in plants

Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences

The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes

PlantBind: an attention-based multi-label neural network for predicting plant transcription factor binding sites

Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants

Uncovering uncharacterized binding of transcription factors from ATAC-seq footprinting data

DeepTFactor: A deep learning-based tool for the prediction of transcription factors

Evaluating tools for transcription factor binding site prediction

High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method

TAMC: A deep-learning approach to predict motif-centric transcriptional factor binding activity based on ATAC-seq profile

Predicting Transcription Factor Binding Sites with Deep Learning

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

DeepSTF: predicting transcription factor binding sites by interpretable deep neural networks combining sequence and shape

Systematic Identification and Annotation of Multiple-Variant Compound Effects at Transcription Factor Binding Sites in Human Genome.

Systematic identification of transcriptional activation domains from non-transcription factor proteins in plants and yeast

The Next Generation of Transcription Factor Binding Site Prediction

Decoding functional regulatory maps via genomic evolutionary footprints in 63 green plants