Reannotation of ORFs in Neisseria Meningitidis MC58 Based on Z Curve Method

WEI Wen,GUO Fengbiao,WU Yinghui
DOI: https://doi.org/10.3724/sp.j.1260.2011.00545
2011-01-01
ACTA BIOPHYSICA SINICA
Abstract:Annotated ORFs in microbial genomes could be usually categorized into two groups: the first group corresponds to known genes;whereas the second one includes unknown-function ORFs.Because the annotation is not always accurate,it is necessary and important to confirm which ORF of the latter group is genuine gene and which is not.Starting from known genes in the former group,the authors used the combination of 21 Z curve variables and SVM to re-predict coding potentials of ORFs contained in the latter group.Ten-fold cross-validation result showed that the average accuracy of the method was greater than 98.45% for recognizing the known genes and the non-gene sequences in Neisseria meningitides genome.In other words,very high accuracy of recognition can be obtained by combining SVM and Z curve method.When applying the model to 810 hypothetical ORFs,216 ones were consistently recognized as non-coding ORFs.Furthermore,functions had been assigned to 341 hypothetical ORFs with high reliability by using Blastp search.According to the COG functional categories,30,53,59 and 159 newly annotated hypothetical genes belong to the information storage and processing,cellular processes and signaling,metabolism and Poorly characterized,respectively.Consequently,it provided a more comprehensive and precise annotation for Neisseria meningitidis MC58 than the original GenBank and RefSeq annotations.
What problem does this paper attempt to address?