Classification of Real and Pseudo Microrna Precursors Using Local Structure-Sequence Features and Support Vector Machine

Chenghai Xue,Fei Li,Tao He,Guo-Ping Liu,Yanda Li,Xuegong Zhang
DOI: https://doi.org/10.1186/1471-2105-6-310
IF: 3.307
2005-01-01
BMC Bioinformatics
Abstract:Abstract Background MicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology. Results A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information. Conclusion The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.
What problem does this paper attempt to address?