Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043.

Ling-Ling Chen,Bin-Guang Ma,Na Gao
DOI: https://doi.org/10.1111/j.1742-4658.2007.06190.x
2008-01-01
FEBS Journal
Abstract:Over-annotation of hypothetical ORFs is a common phenomenon in bacterial genomes, which necessitates confirming the coding reliability of hypothetical ORFs and then predicting their functions. The important plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043 (Eca1043) is a typical case because more than a quarter of its annotated ORFs are hypothetical. Our analysis focuses on annotation of Eca1043 hypothetical ORFs, and comprises two efforts: (a) based on the Z-curve method, 49 originally annotated hypothetical ORFs are recognized as noncoding, this is further supported by principal components analysis and other evidence; and (b) using sequence-alignment tools and some functional resources, more than a half of the hypothetical genes were assigned functions. The potential functions of 427 hypothetical genes are summarized according to the cluster of orthologous groups functional category. Moreover, 114 and 86 hypothetical genes are recognized as putative 'membrane proteins' and 'exported proteins', respectively. Reannotation of Eca1043 hypothetical ORFs will benefit research into the lifestyle, metabolism and pathogenicity of the important plant pathogen. Also, our study proffers a model for the reannotation of hypothetical ORFs in microbial genomes.
What problem does this paper attempt to address?