SOMM4mC: a Second-Order Markov Model for DNA N4-methylcytosine Site Prediction in Six Species

Jiali Yang,Kun Lang,Guangle Zhang,Xiaodan Fan,Yuanyuan Chen,Cong Pian,Arne Elofsson
DOI: https://doi.org/10.1093/bioinformatics/btaa507
IF: 5.8
2020-01-01
Bioinformatics
Abstract:MOTIVATION:DNA N4-methylcytosine (4mC) modification is an important epigenetic modification in prokaryotic DNA due to its role in regulating DNA replication and protecting the host DNA against degradation. An efficient algorithm to identify 4mC sites is needed for downstream analyses.RESULTS:In this study, we propose a new prediction method named SOMM4mC based on a second-order Markov model, which makes use of the transition probability between adjacent nucleotides to identify 4mC sites. The results show that the first-order and second-order Markov model are superior to the three existing algorithms in all six species (Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterruneus and Geobacter pickeringii) where benchmark datasets are available. However, the classification performance of SOMM4mC is more outstanding than that of first-order Markov model. Especially, for E.coli and C.elegans, the overall accuracy of SOMM4mC are 91.8% and 87.6%, which are 8.5% and 6.1% higher than those of the latest method 4mcPred-SVM, respectively. This shows that more discriminant sequence information is captured by SOMM4mC through the dependency between adjacent nucleotides.AVAILABILITY AND IMPLEMENTATION:The web server of SOMM4mC is freely accessible at www.insect-genome.com/SOMM4mC.CONTACT:chenyuanyuan@njau.edu.cn or piancong@njau.edu.cn.
What problem does this paper attempt to address?