Accurate bacterial outbreak tracing with Oxford Nanopore sequencing and reduction of methylation-induced errors

Mara Lohde,Gabriel E. Wagner,Johanna Dabernig-Heinz,Adrian Viehweger,Sascha D. Braun,Stefan Monecke,Celia Diezel,Claudia Stein,Mike Marquet,Ralf Ehricht,Mathias W. Pletz,Christian Brandt
DOI: https://doi.org/10.1101/2023.09.15.556300
2024-05-13
Abstract:Our study investigated the effectiveness of Oxford Nanopore Technologies for accurate outbreak tracing by resequencing 33 isolates of a three-year-long Klebsiella pneumoniae outbreak with Illumina short read sequencing data as the point of reference. We detected considerable base errors through cgMLST and phylogenetic analysis of genomes sequenced with Oxford Nanopore Technologies, leading to the false exclusion of some outbreak-related strains from the outbreak cluster. Nearby methylation sites cause these errors and can also be found in other species besides K. pneumoniae. Based on this data, we explored PCR-based sequencing and a masking strategy, which both successfully addressed these inaccuracies and ensured accurate outbreak tracing. We offer our masking strategy as a bioinformatic workflow (MPOA is freely available on GitHub under the GNUv3 license: github.com/replikation/MPOA) to identify and mask problematic genome positions in a reference-free manner. Our research highlights limitations in using Oxford Nanopore Technologies for sequencing prokaryotic organisms, especially for investing outbreaks. For time-critical projects that cannot wait for further technological developments by Oxford Nanopore Technologies, our study recommends either PCR-based sequencing or using our provided bioinformatic workflow. We would advise that read mapping-based quality control of genomes should be provided when publishing results.
Molecular Biology
What problem does this paper attempt to address?
This paper attempts to solve the base - error problems that occur when using Oxford Nanopore Technologies (ONT) for bacterial outbreak tracking. Specifically, the study found that when using ONT sequencing technology to sequence the genome of Klebsiella pneumoniae, due to base - recognition errors near methylation sites, some outbreak - related strains were wrongly excluded from the outbreak cluster. These errors not only affected the results of core genome multilocus sequence typing (cgMLST), but also affected the accuracy of phylogenetic analysis. To verify this problem, the researchers re - sequenced 33 Klebsiella pneumoniae samples isolated during a three - year outbreak and compared the results with Illumina short - read - length sequencing data. Through this comparison, they discovered some key issues, including: 1. **Base - recognition errors**: There are obvious base errors in ONT sequencing data, especially near methylation sites. 2. **Wrong exclusion of outbreak - related strains**: Due to these base errors, some outbreak - related strains were wrongly excluded from the outbreak cluster. 3. **Influence of different sequencing tools and models**: Different base - recognition tools (such as Guppy and Dorado) and sequencing kits (such as Kit12 and Kit14) have different effects on base - recognition errors. To address these problems, the study proposed two strategies: 1. **PCR - based sequencing**: Use the Nanopore Rapid PCR Barcoding Kit (SQK - RPB114.24) to remove methylated bases before sequencing. This method significantly reduces the number of ambiguous sites and improves the quality of the genome. 2. **Bioinformatics masking strategy**: Developed a bioinformatics workflow (MPOA) to detect and mask ambiguous sites in ONT sequencing data. This masking strategy can ensure accurate identification and masking of problem sites without using a reference genome. Through these two methods, the researchers successfully solved the base - recognition errors in ONT sequencing data and ensured accurate outbreak tracking. These methods are particularly important for time - sensitive projects because they can provide reliable solutions while waiting for further technological development. In addition, the study recommends that quality control based on read - mapping should be provided when releasing genomic data to ensure the accuracy of the data.