Recognition of Translation Initiation Sites of Eukaryotic Genes Based on an Em Algorithm

YH Wang,HY Ou,FB Guo
DOI: https://doi.org/10.1089/106652703322539042
IF: 1.549
2003-01-01
Journal of Computational Biology
Abstract:With the rapid increase of DNA databases of human and other eukaryotic model organisms, a large great number of genes need to be distinguished from the DNA databases. Exact recognition of translation initiation sites (TISs) of eukaryotic genes is very important to understand the translation initiation process, predict the detailed structure of eukaryotic genes, and annotate uncharacterized sequences. The problem has not been solved satisfactorily, especially for recognizing TISs of the eukaryotic genes with shorter first exons. It is an important task for extracting new features and finding new powerful algorithms for recognizing TISs of eukaryotic genes. In this paper, the important characteristics of shorter flanking fragments around TISs are extracted and an expectation-maximization (EM) algorithm based on incomplete data is used to recognize TISs of eukaryotic genes. The accuracy is up to 87.8% over a six-fold cross-validation test. The result shows that the identification variables are effectively extracted and the EM algorithm is a powerful tool to predict the TISs of eukaryotic genes. The algorithm also can be applied to other classification or clustering tasks in bioinformatics.
What problem does this paper attempt to address?