RED-ML: a Novel, Effective RNA Editing Detection Method Based on Machine Learning

Heng Xiong,Dongbing Liu,Qiye Li,Mengyue Lei,Liqin Xu,Liang Wu,Zongji Wang,Shancheng Ren,Wangsheng Li,Min Xia,Lihua Lu,Haorong Lu,Yong Hou,Shida Zhu,Xin Liu,Yinghao Sun,Jian Wang,Huanming Yang,Kui Wu,Xun Xu,Leo J Lee
DOI: https://doi.org/10.1093/gigascience/gix012
IF: 7.658
2017-01-01
GigaScience
Abstract:With the advancement of second generation sequencing techniques, our ability to detect and quantify RNA editing on a global scale has been vastly improved. As a result, RNA editing is now being studied under a growing number of biological conditions so that its biochemical mechanisms and functional roles can be further understood. However, a major barrier that prevents RNA editing from being a routine RNA-seq analysis, similar to gene expression and splicing analysis, for example, is the lack of user-friendly and effective computational tools. Based on years of experience of analyzing RNA editing using diverse RNA-seq datasets, we have developed a software tool, RED-ML: RNA Editing Detection based on Machine learning (pronounced as “red ML”). The input to RED-ML can be as simple as a single BAM file, while it can also take advantage of matched genomic variant information when available. The output not only contains detected RNA editing sites, but also a confidence score to facilitate downstream filtering. We have carefully designed validation experiments and performed extensive comparison and analysis to show the efficiency and effectiveness of RED-ML under different conditions, and it can accurately detect novel RNA editing sites without relying on curated RNA editing databases. We have also made this tool freely available via GitHub . We have developed a highly accurate, speedy and general-purpose tool for RNA editing detection using RNA-seq data. With the availability of RED-ML, it is now possible to conveniently make RNA editing a routine analysis of RNA-seq. We believe this can greatly benefit the RNA editing research community and has profound impact to accelerate our understanding of this intriguing posttranscriptional modification process.
What problem does this paper attempt to address?