Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

Heng Li,Jin-Song Liu,Zhao Xu,Jiao Jin,Lin Fang,Lei Gao,Yu-Dong Li,Xing,Shao-Gen Gao,Tao Liu,Hai-Hong Li,Yan Li,Li-Jun Fang,Hui-Min Xie,Wei-Mou Zheng,Bai-Lin Hao
DOI: https://doi.org/10.1007/s11390-005-0446-x
2005-01-01
Abstract:With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM, GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.
What problem does this paper attempt to address?