Sawfish: Improving long-read structural variant discovery and genotyping with local haplotype modeling

Christopher T Saunders,James M Holt,Daniel N Baker,Juniper A Lake,Jonathan R Belyeu,Zev Kronenberg,William J Rowell,Michael A Eberle
DOI: https://doi.org/10.1101/2024.08.19.608674
2024-08-20
Abstract:Structural variants (SVs) play an important role in evolutionary and functional genomics but are challenging to characterize. High-accuracy, long-read sequencing can substantially improve SV characterization when coupled with effective calling methods. While state-of the-art long-read SV callers are highly accurate, further improvements are achievable by systematically modeling local haplotypes during SV discovery and genotyping. We describe sawfish, an SV caller for mapped high-quality long reads incorporating systematic SV haplotype modeling to improve accuracy and resolution. Assessment against the draft Genome in a Bottle (GIAB) SV benchmark from the T2T-HG002-Q100 diploid assembly shows that sawfish has the highest accuracy among state-of-the-art long-read SV callers across every tested SV size group. Additionally, sawfish maintains the highest accuracy at every tested depth level from 10 to 32-fold coverage, such that other callers required at least 30-fold coverage to match sawfish accuracy at 15-fold coverage. Sawfish also shows the highest accuracy in the GIAB challenging medically relevant genes benchmark, demonstrating improvements in both comprehensive and medically relevant contexts. When joint-genotyping 10 samples from CEPH-1463, sawfish has over 9000 more pedigree-concordant calls than other state-of-the-art SV callers, with the highest proportion of concordant SVs (78%) as well. Sawfish's quality model can be used to select for an even higher proportion of concordant SVs (86%), while still calling over 5000 more pedigree-concordant SVs than other callers. These results demonstrate that sawfish improves on the state-of-the-art for long-read SV calling accuracy across both individual and joint-sample analyses. Sawfish is released as a pre-compiled Linux binary and user guide on GitHub: https://github.com/PacificBiosciences/sawfish.
Bioinformatics
What problem does this paper attempt to address?