Rcanid: Read Clustering and Assembly-based Novel Insertion Detection Tool

Yilei Fu,Tao Jiang,Bo Liu,Yadong Wang
DOI: https://doi.org/10.1109/bibm.2018.8621105
2018-01-01
Abstract:Novel sequence insertion (NSI) is a class of genome structural variations (SVs) having important biological functions and strong correlations with phenotypes and diseases. The rapid development of long read sequencing technologies provide the opportunity to more comprehensively study NSIs, since the much longer reads are helpful to the assembly and location of novel sequences. However, state-of-the-art long read-based SV detection approaches are in generic design to detect various kinds of SVs, and they either only use the signals of chimerically aligned reads or the contigs of de novoassembly, which are not good at NSI detection and/or computationally expensive. Herein, we propose Read Clustering and Assembly-based Novel Insertion Detection tool (rCANID), a novel long read-based NSI detection approach. rCANID fully takes the advantage of chimerically aligned and unaligned reads by its specifically designed read clustering and lightweight local read assembly methods to effectively reconstruct inserted sequences with relatively low computational cost. Benchmarking on both of simulated and real datasets demonstrates that rCANID can sensitively discover NSIs, especially for those having large inserted novel sequences, which could be hard to state-of-the-art approaches. rCANID is suited to be integrated into many computational pipelines to play important roles in many genomic studies.
What problem does this paper attempt to address?