SVsearcher: A More Accurate Structural Variation Detection Method in Long Read Data.

Yan Zheng,Xuequn Shang,Wing-Kin Sung
DOI: https://doi.org/10.1016/j.compbiomed.2023.106843
IF: 7.7
2023-01-01
Computers in Biology and Medicine
Abstract:Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50×) datasets and more than 25% for low coverage (10×) datasets. More importantly, SVsearcher can identify 81.7%–91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)–54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.
What problem does this paper attempt to address?