Genomic law guided gene prediction in fungi and metazoans.

Yaping Fang,Jun Li
DOI: https://doi.org/10.1504/IJCBDD.2013.052197
2013-01-01
Abstract:Protein coding gene prediction by computational approaches is a fundamental step for genome annotation. However, it is a challenge to accurately predict eukaryotic genes in silico. By surveying the model genomes, we found that the Spearman's rank correlation coefficient between the number of experimental-verified genes and the size of genomes was 0.96 for all eukaryotes except plants, indicating the relationship between genome size and the number of coding genes can be expressed with a monotonic function. Regression analysis found that the relationship of total protein coding genes over genome size followed a logarithmic equation. We integrated the equation into ab initio gene prediction software to guide the gene prediction by constraining the total number of predicted genes. We evaluated the software in three eukaryotic genomes. Results showed that >90% of false positive predictions were removed while >80% of true positives were retained, resulting in much higher specificity.
What problem does this paper attempt to address?