Inferring an Organism-Specific Optimal Threshold for Predicting Protein Coding Regions in Eukaryotes Based on a Bootstrapping Algorithm

Shanglei Xu,Nini Rao,Xi Chen,Bo Zhou
DOI: https://doi.org/10.1007/s10529-011-0525-8
2011-01-01
Biotechnology Letters
Abstract:The accuracy of prediction methods based on power spectrum analysis depends on the threshold that is used to discriminate between protein coding and non-coding sequences in the genomes of eukaryotes. Because the structure of genes vary among different eukaryotes, it is difficult to determine the best prediction threshold for a eukaryote relying only on prior biological knowledge. To improve the accuracy of prediction methods based on power spectral analysis, we developed a novel method based on a bootstrap algorithm to infer organism-specific optimal thresholds for eukaryotes. As prior information, our method requires the input of only a few annotated protein coding regions from the organism being studied. Our results show that using the calculated optimal thresholds for our test datasets, the average prediction accuracy of our method is 81%, an increase of 19% over that obtained using the same empirical threshold P = 4 for all datasets. The proposed method is simple and convenient and easily applied to infer optimal thresholds that can be used to predict coding regions in the genomes of most organisms.
What problem does this paper attempt to address?