Long-Read Based Novel Sequence Insertion Detection with Rcanid

Tao Jiang,Yilei Fu,Bo Liu,Yadong Wang
DOI: https://doi.org/10.1109/tnb.2019.2908438
IF: 3.9
2019-01-01
IEEE Transactions on NanoBioscience
Abstract:Novel sequence insertion (NSI) is an essential category of genome structural variations (SVs), which represents DNA segments absent from the reference genome assembly. It has important biological functions and strong correlation with phenotypes and diseases. The rapid development of long-read sequencing technologies provides the opportunities to discover NSIs more sensitively, since the much longer reads are helpful for the assembly and location of the novel sequences. However, most of state-of-the-art long-read based SV detection approaches are in generic design to detect various kinds of SVs, and they are either not suited to detect NSIs or computationally expensive. Herein, we propose read clustering and assembly-based novel insertion detection tool (rCANID). It applies tailored chimerically aligned and unaligned read clustering and lightweight local assembly methods to reconstruct inserted sequences with low computational cost. Benchmarks on both simulated and real datasets demonstrate that rCANID can discover NSIs sensitively and efficiently, especially for NSI events with long inserted sequences which is still a non-trivial task for state-of-the-art approaches. With its good NSI detection ability, rCANID is suited to be integrated into computational pipelines to play important roles in many cutting-edge genomics studies.
What problem does this paper attempt to address?