31 Discovery of Novel Ncrna by Scanning Multiple Genome Alignments

Yinghan Fu,Zhenjiang Xu,Zhi J. Lu,Shan Zhao,David H. Mathews
DOI: https://doi.org/10.1080/07391102.2013.786463
2013-01-01
Abstract:Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is a pervasive transcription. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on structural conservation. A new method called Multifind, based on Multilign (Xu & Mathews, 2011), was developed. It uses an algorithm that predicts common structures among multiple sequences and estimates the probability that input sequences are ncRNA using a classification support vector machine (SVM). Multilign uses Dynalign (Mathews & Turner, 2002), which folds and aligns two sequences simultaneously without requiring any sequence identity; its structure prediction quality will therefore not be affected by input sequence diversity. Benchmarks showed, Multifind performs better than RNAz on testing sequences extracted from Rfam database (Gardner et al., 2011), especially on sequences that are more diverse. For de novo ncRNA discovery in genomes, Multifind had an advantage in low similarity regions of genome alignments. Multifind takes about 48 hours to finish scanning the whole yeast genome alignment and RNAz takes about 4 hours, therefore, its computational requirements do not present a barrier for most of the users.
What problem does this paper attempt to address?