Identification of DNA Modification Sites Based on Elastic Net and Bidirectional Gated Recurrent Unit with Convolutional Neural Network

Bin Yu,Yaqun Zhang,Xue Wang,Hongli Gao,Jianqiang Sun,Xin Gao
DOI: https://doi.org/10.1016/j.bspc.2022.103566
IF: 5.1
2022-01-01
Biomedical Signal Processing and Control
Abstract:DNA N4-methylcytosine (4mC) and DNA N6-methyladenine (6mA) are significant epigenetic modifications. 4mC is closely related to the restriction modification system, and 6mA has a hand in the process of various cellular activities. In order to further explore their functional mechanisms and biological significance, and to overcome the bottleneck of narrow coverage in traditional experimental methods, it is needed to propose an efficient prediction method with a wide range of applications. In this work, we develop a prediction method named 4mCi6mA-BGC to predict 4mC sites and 6mA sites. First, we employ binary, K-mer nucleotide frequency (K-mer), pseudo K-tuple nucleotide composition (PseKNC), dinucleotide-based auto covariance (DAC) and monoDiKGap theoretical description (MonoDiKGap) to encode DNA sequences. Then, the elastic net is employed for feature selection, and the optimized feature space is put into a deep learning framework composed of bidirectional gated recurrent unit and convolutional neural network. The benchmark datasets include six datasets, which contain 14 328 4mC sites from different species. The results of 10-fold cross-validation indicate that the prediction accuracy significantly outperforms the existing prediction methods. Meanwhile, use independent datasets Rice and Arabidopsis thaliana to further confirm the predictive ability of 4mCi6mA-BGC. Compared with the existing prediction methods, 4mCi6mA-BGC shows the best prediction performance. These comprehensive results indicate that our method can identify DNA modification sites represented by 4mC and 6mA sites.
What problem does this paper attempt to address?