skalo: using SKA split k-mers with coloured de Brujin graphs to genotype indels

Romain Derelle,Kieran Madon,Nimalan Arinaminpathy,Ajit Lalvani,Simon R Harris,John A Lees,Leonid Chindelevitch
DOI: https://doi.org/10.1101/2024.10.02.616334
2024-10-03
Abstract:Insertions and deletions (indels) are important contributors to the genetic diversity and evolution of pathogens like Mycobacterium tuberculosis. However, accurately identifying them from genomic data remains challenging using current variant calling methods. We present skalo, a graph-based algorithm that complements the popular split k-mer approach implemented in the SKA software. skalo is designed for alignment-free inferences of indels between closely related haploid genomes, which are ignored by SKA. The graph traversal implemented in skalo enables rapid detection of indels and complex variants, while retaining the speed and alignment-free advantages of SKA. Through benchmarking on simulated and real Mycobacterium tuberculosis data, we demonstrated its ability to identify indels and complex variants with high precision, and explored their utility as phylogenetic markers to resolve isolates relationships. By providing an efficient and easy-to-use method to extract additional variants from genomic data, skalo can enhance our understanding of pathogen evolution and transmission, with potential applications across diverse pathogen species. skalo is written in Rust and is freely available at https://github.com/rderelle/skalo.
Biology
What problem does this paper attempt to address?