Abstract:Promoters are DNA sequences that bind with RNA polymerase to initiate transcription, regulating this process through interactions with transcription factors. Accurate identification of promoters is crucial for understanding gene expression regulation mechanisms and developing therapeutic approaches for various diseases. However, experimental techniques for promoter identification are often expensive, time-consuming, and inefficient, necessitating the development of accurate and efficient computational models for this task. Enhancing the model's ability to recognize promoters across multiple species and improving its interpretability pose significant challenges. In this study, we introduce a novel interpretable model based on graph neural networks, named GraphPro, for multi-species promoter identification. Initially, we encode the sequences using k-tuple nucleotide frequency pattern, dinucleotide physicochemical properties, and dna2vec. Subsequently, we construct two feature extraction modules based on convolutional neural networks and graph neural networks. These modules aim to extract specific motifs from the promoters, learn their dependencies, and capture the underlying structural features of the promoters, providing a more comprehensive representation. Finally, a fully connected neural network predicts whether the input sequence is a promoter. We conducted extensive experiments on promoter datasets from eight species, including Human, Mouse, and Escherichia coli. The experimental results show that the average Sn, Sp, Acc and MCC values of GraphPro are 0.9123, 0.9482, 0.8840 and 0.7984, respectively. Compared with previous promoter identification methods, GraphPro not only achieves better recognition accuracy on multiple species, but also outperforms all previous methods in cross-species prediction ability. Furthermore, by visualizing GraphPro's decision process and analyzing the sequences matching the transcription factor binding motifs captured by the model, we validate its significant advantages in biological interpretability. The source code for GraphPro is available at https://github.com/liuliwei1980/GraphPro.

A novel deep learning identifier for promoters and their strength using heterogeneous features

PromID: human promoter prediction by deep learning

Prediction of Prokaryotic and Eukaryotic Promoters Using Convolutional Deep Learning Neural Networks

Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks

Variations of the ulnar nerve and ulnar artery in Guyon's canal: a cadaveric study.

A deep learning based two-layer predictor to identify enhancers and their strength

A dramatic deterioration in diabetic retinopathy with improvement in glycated haemoglobin (HbA1c) on exenatide treatment

DeepLncPro: an interpretable convolutional neural network model for identifying long non-coding RNA promoters

Comparison of machine learning and deep learning techniques in promoter prediction across diverse species

GraphPro: An interpretable graph neural network-based model for identifying promoters in multiple species

DPProm: A Two-Layer Predictor for Identifying Promoters and Their Types on Phage Genome Using Deep Learning

dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost

PromoterExplorer: an Effective Promoter Identification Method Based on the AdaBoost Algorithm

Identification and classification of promoters using the attention mechanism based on long short-term memory

An Effective Promoter Detection Method Using the Adaboost Algorithm.

PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest

iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters

DeepRegFinder: deep learning-based regulatory elements finder

iProL: identifying DNA promoters from sequence information based on Longformer pre-trained model

A Method for Yeast Promoter Strength Prediction Based on a Branched CNN Feature Extractor

BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters