Key Steps for Robust Whole Genome Sequence Data Generation

Daniela Pinto,Gonçalo Themudo,André C. Pereira,Ana Botelho,Mónica V. Cunha

DOI: https://doi.org/10.3390/ijms25073869

IF: 5.6

2024-03-31

International Journal of Molecular Sciences

Abstract:Epidemiological surveillance of animal tuberculosis (TB) based on whole genome sequencing (WGS) of Mycobacterium bovis has recently gained track due to its high resolution to identify infection sources, characterize the pathogen population structure, and facilitate contact tracing. However, the workflow from bacterial isolation to sequence data analysis has several technical challenges that may severely impact the power to understand the epidemiological scenario and inform outbreak response. While trying to use archived DNA from cultured samples obtained during routine official surveillance of animal TB in Portugal, we struggled against three major challenges: the low amount of M. bovis DNA obtained from routinely processed animal samples; the lack of purity of M. bovis DNA, i.e., high levels of contamination with DNA from other organisms; and the co-occurrence of more than one M. bovis strain per sample (within-host mixed infection). The loss of isolated genomes generates missed links in transmission chain reconstruction, hampering the biological and epidemiological interpretation of data as a whole. Upon identification of these challenges, we implemented an integrated solution framework based on whole genome amplification and a dedicated computational pipeline to minimize their effects and recover as many genomes as possible. With the approaches described herein, we were able to recover 62 out of 100 samples that would have otherwise been lost. Based on these results, we discuss adjustments that should be made in official and research laboratories to facilitate the sequential implementation of bacteriological culture, PCR, downstream genomics, and computational-based methods. All of this in a time frame supporting data-driven intervention.

biochemistry & molecular biology,chemistry, multidisciplinary

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on three aspects: 1. **Low DNA amount**: During the routine processing, the amount of DNA recovered from samples obtained from the automatic growth detection system is low, which hinders the direct whole - genome sequencing (WGS). The paper mentions that this problem affects 17.8% of the total samples, and 62.7% of these samples are from the cultures of the automatic growth detection system. 2. **High contamination rate**: The DNA extracted from these samples contains a high level of DNA contamination from other organisms, which affects the quality and quantity of the sequence data. Specifically, the proportion of non - tuberculosis Mycobacterium reads in many samples is less than 50%, and some samples cannot even be classified. 3. **Mixed infection**: There are multiple Mycobacterium bovis strains in some samples, which makes it difficult to correctly call single - nucleotide polymorphisms (SNPs). To overcome these problems, the author implemented a series of rescue strategies, including: - **Whole - genome amplification (WGA)**: Through the Phi29 - dependent whole - genome amplification technology, the double - stranded DNA concentration in the samples was successfully increased, enabling the recovery of samples that were originally unusable for WGS. - **Filtering Mycobacterium reads**: Before aligning the samples, non - Mycobacterium reads were filtered out to improve data quality. - **Isolating mixed - infection samples**: The SplitStrains tool was used to isolate different strains in mixed - infection samples, thereby recovering some originally unusable samples. These strategies not only enabled the recovery of many originally unusable samples but also revealed the steps that can be adjusted in the upstream procedures before WGS to reduce the occurrence of these problems, further promoting the integration of culture - based Mycobacterium bovis detection and whole - genome sequence analysis.

Key Steps for Robust Whole Genome Sequence Data Generation

Emerging Challenges of Whole-Genome-sequencing–powered Epidemiological Surveillance of Globally Distributed Clonal Groups of Bacterial Infections, Giving Acinetobacter Baumannii ST195 As an Example

Whole-Genome sequencing in routine Mycobacterium bovis epidemiology – scoping the potential

A Bioinformatics Whole-Genome Sequencing Workflow for Clinical Mycobacterium tuberculosis Complex Isolate Analysis, Validated Using a Reference Collection Extensively Characterized with Conventional Methods and In Silico Approaches

Whole Genome Sequencing of Mycobacterium Tuberculosis: Current Standards and Open Issues

Automated Whole Genome Sequencing for Mycobacterium tuberculosis Analysis

Who fails to complete tuberculosis treatment? Temporal trends and risk factors for treatment interruption in a community-based directly observed therapy programme in a rural district of South Africa.

Clinical use of whole genome sequencing for Mycobacterium tuberculosis

Microfluidic Capture of Mycobacterium tuberculosis from Clinical Samples for Culture-Free Whole-Genome Sequencing

Use of Whole Genome Sequencing for Mycobacterium tuberculosis Complex Antimicrobial Susceptibility Testing

The utility of whole-genome sequencing to identify likely transmission pairs for pathogens with slow and variable evolution

A mycobacterial DNA extraction protocol designed for resource limited settings generates high quality whole genome sequencing

Strengthening the genomic surveillance of Francisella tularensis by using culture-free whole-genome sequencing from biological samples

Considering best practice standards for routine whole-genome sequencing for TB care and control

Whole genome sequencing identifies bacterial factors affecting transmission of multidrug-resistant tuberculosis in a high-prevalence setting

Comprehensive and accurate genetic variant identification from contaminated and low-coverage Mycobacterium tuberculosis whole genome sequencing data

Whole genome sequencing in clinical and public health microbiology

Towards standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases

Multi-platform whole genome sequencing for tuberculosis clinical and surveillance applications

On the way to the Tokyo Summer Olympic Games (2020). Prevention of severe head and neck injuries in judo: it’s time for action

Systematic review and meta-analysis of protocols and yield of direct from sputum sequencing of Mycobacterium tuberculosis