Comprehensive evaluation of protein-coding sORFs prediction based on a random sequence strategy

Jiafeng Yu,Li Guo,Xianghua Dou,Wenwen Jiang,Bowen Qian,Jian Liu,Jun Wang,Chunling Wang,Congmin Xu
DOI: https://doi.org/10.52586/4943
2021-01-01
Abstract:<b>Background</b>: Small open reading frames (sORFs) with protein-coding ability present unprecedented challenge for genome annotation because of their short sequence and low expression level. In the past decade, only several prediction methods have been proposed for discovery of protein-coding sORFs and lack of objective and uniform negative datasets has become an important obstacle to sORFs prediction. The prediction efficiency of current sORFs prediction methods needs to be further evaluated to provide better research strategies for protein-coding sORFs discovery. <b>Methods</b>: In this work, nine mainstream existing methods for predicting protein-coding potential of ORFs are comprehensively evaluated based on a random sequence strategy. <b>Results</b>: The results show that the current methods perform poorly on different sORFs datasets. For comparison, a sequence based prediction algorithm trained on prokaryotic sORFs is proposed and its better prediction performance indicates that the random sequence strategy can provide feasible ideas for protein-coding sORFs predictions. <b>Conclusions</b>: As a kind of important functional genomic element, discovery of protein-coding sORFs has shed light on the dark proteomes. This evaluation work indicates that there is an urgent need for developing specialized prediction tools for protein-coding sORFs in both eukaryotes and prokaryotes. It is expected that the present work may provide novel ideas for future sORFs researches.
cell biology,biochemistry & molecular biology
What problem does this paper attempt to address?