Integrated population clustering and genomic epidemiology with PopPIPE

Martin P McHugh,Samuel T Horsfield,Johanna von Wachsmann,Jacqueline Toussaint,Kerry A Pettigrew,Elzbieta Czarniak,Thomas J Evans,Alistair Leanord,Luke Tysall,Stephen H Gillespie,Kate E Templeton,Matthew T. G. Holden,Nicholas J Croucher,John A Lees
DOI: https://doi.org/10.1101/2024.12.05.626978
2024-12-09
Abstract:Genetic distances between bacterial DNA sequences can be used to cluster populations into closely related subpopulations, and as an additional source of information when detecting possible transmission events. Due to their variable gene content and order, reference-free methods offer more sensitive detection of genetic differences, especially among closely related samples found in outbreaks. However, across longer genetic distances, frequent recombination can make calculation and interpretation of these differences more challenging, requiring significant bioinformatic expertise and manual intervention during the analysis process. Here we present a Population analysis PIPEline (PopPIPE) which combines rapid reference-free genome analysis methods to analyse bacterial genomes across these two scales, splitting whole populations into subclusters and detecting plausible transmission events within closely related clusters. We use k-mer sketching to split populations into strains, followed by split k-mer analysis and recombination removal to create alignments and subclusters within these strains. We first show that this approach creates high quality subclusters on a population-wide dataset of Streptococcus pneumoniae. When applied to nosocomial vancomycin resistant Enterococcus faecium samples, PopPIPE finds transmission clusters which are more epidemiologically plausible than core genome or MLST-based approaches. Our pipeline is rapid and reproducible, creates interactive visualisations, and can easily be reconfigured and re-run on new datasets. Therefore PopPIPE provides a user-friendly pipeline for analyses spanning species-wide clustering to outbreak investigations.
Biology
What problem does this paper attempt to address?