Comparative Exon Prediction Based on Heuristic Coding Region Alignment.

SJ Hsieh,CY Lin,YS Chung,CY Tang
DOI: https://doi.org/10.1109/ispan.2005.29
2005-01-01
Abstract:Identifying protein coding genes is one of most challenging problems in computational molecular biology. With increasing numbers of sequenced eukaryotic genomes and syntenic maps across species, it is possible to apply genomic comparison for gene recognition. Here, we propose a program, EXONALIGN, which simultaneously aligns and predicts exons between homologous genomic sequences. The program applies CORAL (coding region alignment), a heuristic linear time alignment tool, to determine whether the regions following the conserved splice signals pairs are significant or not. The approach which combines the intrinsic splice site strength with the conservation of protein coding regions and exon-intron structures reduces the computation time and increases the prediction accuracy. EXONALIGN was tested on ROSETTA data set of 117 human-mouse homologous sequence pairs. At the exon level the sensitivity and specificity of EXONALIGN are respectively 89% and 88%, and both are 98% at the nucleotide level. The rates of missing exons and wrong exons are as low as 2%.
What problem does this paper attempt to address?