HairSplitter: haplotype assembly from long, noisy reads

Roland Faure,Dominique Lavenier,Jean-François Flot
DOI: https://doi.org/10.1101/2024.02.13.580067
2024-10-03
Abstract:Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains in a single sequence. This limitation has been hampering metagenome analysis, where diverse strains may harbor crucial functional distinctions. We introduce a novel software, HairSplitter, designed to retrieve strains from a strain-oblivious assembly and long reads. The method uses a custom variant calling process to operate with erroneous long reads and introduces a new read clustering algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter can recover more strains while being faster than state-of-the-art tools, both in the viral and the bacterial case.
Bioinformatics
What problem does this paper attempt to address?