deepMiRGene: Deep Neural Network based Precursor microRNA Prediction

Seunghyun Park,Seonwoo Min,Hyunsoo Choi,Sungroh Yoon
DOI: https://doi.org/10.48550/arXiv.1605.00017
2016-04-30
Abstract:Since microRNAs (miRNAs) play a crucial role in post-transcriptional gene regulation, miRNA identification is one of the most essential problems in computational biology. miRNAs are usually short in length ranging between 20 and 23 base pairs. It is thus often difficult to distinguish miRNA-encoding sequences from other non-coding RNAs and pseudo miRNAs that have a similar length, and most previous studies have recommended using precursor miRNAs instead of mature miRNAs for robust detection. A great number of conventional machine-learning-based classification methods have been proposed, but they often have the serious disadvantage of requiring manual feature engineering, and their performance is limited as well. In this paper, we propose a novel miRNA precursor prediction algorithm, deepMiRGene, based on recurrent neural networks, specifically long short-term memory networks. deepMiRGene automatically learns suitable features from the data themselves without manual feature engineering and constructs a model that can successfully reflect structural characteristics of precursor miRNAs. For the performance evaluation of our approach, we have employed several widely used evaluation metrics on three recent benchmark datasets and verified that deepMiRGene delivered comparable performance among the current state-of-the-art tools.
Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accurately identify precursor microRNAs (pre - miRNAs) in computational biology. MicroRNAs (miRNAs) play a crucial role in post - transcriptional gene regulation, so their identification is one of the most fundamental problems in computational biology. However, since miRNAs are usually short in length (about 20 to 23 base pairs), it is difficult to distinguish their coding sequences from other non - coding RNAs and pseudo - miRNAs, which have similar lengths. Although traditional machine - learning methods have proposed some classification methods, these methods often require manual feature engineering and have limited performance. To solve these problems, this paper proposes a new precursor miRNA prediction algorithm based on recurrent neural networks (RNNs), especially long - short - term memory networks (LSTMs) - deepMiRGene. This algorithm can automatically learn appropriate features from data without manual feature engineering and constructs a model that can successfully reflect the structural characteristics of precursor miRNAs. Through performance evaluation using a variety of widely - used evaluation metrics on three of the latest benchmark datasets, it is verified that deepMiRGene performs equally well or better among the current state - of - the - art tools. Specifically, the main contributions of deepMiRGene are as follows: - **No need for manual feature engineering**: Utilize an end - to - end deep - learning method, only requiring simple pre - processing instead of a large amount of domain knowledge to design hand - crafted features. - **Solve the palindromic structure problem of precursor miRNAs**: Propose new methods to learn the palindromic secondary structure of precursor miRNAs. - **Superior performance on cross - species data**: Even when there are significant differences between different species, deepMiRGene can still exhibit the best performance. In conclusion, this paper aims to improve the accuracy and robustness of precursor miRNA identification by introducing a new deep - learning method, thereby promoting the development of related research.