Lncdml: Identification of Long Non-Coding RNAs by Deep Metric Learning

Pengfei Zhao,Qinke Peng,Zhibo Zhu,Tian Han,Rida Dong,Huijun Huang
DOI: https://doi.org/10.1109/cac.2018.8623112
2018-01-01
Abstract:The next-generation sequencing technologies provide a great deal of transcripts for bioinformatics research. Specially, because of the regulation of long non-coding RNAs (lncRNAs) in various cellular processes, the research on IncRNAs is in full swing. And the solution of IncRNAs identification is the basis for the in-depth study of its functions. In this study, we present an approach to identify the IncRNAs from large scale transcripts, named IncDML which is completely different from previous identification methods. In our model, we extract signal to noise ratio (SNR) and k-mer from transcripts sequences as features. Firstly, we just use the SNR to cluster the original dataset to three parts. In this process, we achieve preliminary identification effect to some extent. Then abandoning traditional feature selection, we directly measure the relationship between each pair of samples by deep metric learning for each part of data. Finally, a novel classifier based on complex network is applied to achieve the final identification. The experiment results show that IncDML is a very effective method for identifying IncRNAs.
What problem does this paper attempt to address?