GSRNet, an adversarial training-based deep framework with multi-scale CNN and BiGRU for predicting genomic signals and regions
Gancheng Zhu,Yusi Fan,Fei Li,Annebella Tsz Ho Choi,Zhikang Tan,Yiruo Cheng,Kewei Li,Siyang Wang,Changfan Luo,Hongmei Liu,Gongyou Zhang,Zhaomin Yao,Yaqi Zhang,Lan Huang,Fengfeng Zhou
DOI: https://doi.org/10.1016/j.eswa.2023.120439
IF: 8.5
2023-05-12
Expert Systems with Applications
Abstract:A genome carries many functional genomic signals and regions (GSRs), which play a vital role in orchestrating the complex biological processes in eukaryotic organisms. Precise recognition of the GSRs within a genomic sequence is the first step to an understanding of genomic organization and gene regulation. Previous studies have used machine learning or deep learning algorithms to identify GSRs based on hand-crafted features, that frequently fail to capture complex patterns within the GSRs. The one-hot encoding or word2vec embedding algorithms used in several deep learning-based studies have the potential to overcome the weakness of the human-designed features, but they may fail to capture contextual and positional information. The present study proposes a general-purpose end-to-end framework for GSR prediction (GSRNet), that integrates DNABERT embedding, adversarial training, BiGRU, and multi-scale CNN to eliminate human involvement in feature engineering. The GSRNet is evaluated with polyadenylation signals (PAS) and translation initiation sites (TIS) prediction tasks. The comparative experiments show that the proposed GSRNet outperforms the state-of-the-art methods reported in previous studies, with a drop in the error rate by 1.08% and 1.50% for human PAS and TIS GSR, respectively. Our model reduces the relative error rate up to 8.73% and 32.97%, respectively. The improved detections of the two types of GSRs (PAS and TIS) across four organisms confirmed the effectiveness and robustness of the proposed GSRNet. The source code and the data are freely available at http://www.healthinformaticslab.org/supp/resources.php .
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science