Functional annotation of ORFs in viral genomes from primary sequence by support vector machine approach

Liang Liang,Lianyi Han,Congzhong Cai,Wenhui Huang,Yuzong Chen,Zhiliang Ji
2007-01-01
Journal of Computational Information Systems
Abstract:Identification of putative protein-coding open reading frames (ORFs) in viral genomes can facilitate mechanistic study of viruses and drug development. However, a substantial percentage of these ORFs have no significant sequence similarity to those of known proteins, which complicates the task for probing their function. Computational methods complement or in combination with sequence alignment and clustering methods are being explored. In this work, we represent a learning algorithm-based method for functional classification of ORFs in complete genomes of 4 different viruses HIV-1, hepatitis B, human adenovirus and SARS coronavirus. Of the 80 functionally-annotated ORFs in all these viral genomes, 85% have the predicted functional classes in various degrees consistent with the annotated functions, 12.5% are misclassified and 2.5% are unpredictable by class because the corresponding annotated classes are not covered by SVMProt. Our study suggests that our SVMProt may, to a certain extent, provide useful hint about the function of the ORFs in the viral genomes.
What problem does this paper attempt to address?