Key Steps for Robust Whole Genome Sequence Data Generation

Daniela Pinto,Gonçalo Themudo,André C. Pereira,Ana Botelho,Mónica V. Cunha
DOI: https://doi.org/10.3390/ijms25073869
IF: 5.6
2024-03-31
International Journal of Molecular Sciences
Abstract:Epidemiological surveillance of animal tuberculosis (TB) based on whole genome sequencing (WGS) of Mycobacterium bovis has recently gained track due to its high resolution to identify infection sources, characterize the pathogen population structure, and facilitate contact tracing. However, the workflow from bacterial isolation to sequence data analysis has several technical challenges that may severely impact the power to understand the epidemiological scenario and inform outbreak response. While trying to use archived DNA from cultured samples obtained during routine official surveillance of animal TB in Portugal, we struggled against three major challenges: the low amount of M. bovis DNA obtained from routinely processed animal samples; the lack of purity of M. bovis DNA, i.e., high levels of contamination with DNA from other organisms; and the co-occurrence of more than one M. bovis strain per sample (within-host mixed infection). The loss of isolated genomes generates missed links in transmission chain reconstruction, hampering the biological and epidemiological interpretation of data as a whole. Upon identification of these challenges, we implemented an integrated solution framework based on whole genome amplification and a dedicated computational pipeline to minimize their effects and recover as many genomes as possible. With the approaches described herein, we were able to recover 62 out of 100 samples that would have otherwise been lost. Based on these results, we discuss adjustments that should be made in official and research laboratories to facilitate the sequential implementation of bacteriological culture, PCR, downstream genomics, and computational-based methods. All of this in a time frame supporting data-driven intervention.
biochemistry & molecular biology,chemistry, multidisciplinary
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on three aspects: 1. **Low DNA amount**: During the routine processing, the amount of DNA recovered from samples obtained from the automatic growth detection system is low, which hinders the direct whole - genome sequencing (WGS). The paper mentions that this problem affects 17.8% of the total samples, and 62.7% of these samples are from the cultures of the automatic growth detection system. 2. **High contamination rate**: The DNA extracted from these samples contains a high level of DNA contamination from other organisms, which affects the quality and quantity of the sequence data. Specifically, the proportion of non - tuberculosis Mycobacterium reads in many samples is less than 50%, and some samples cannot even be classified. 3. **Mixed infection**: There are multiple Mycobacterium bovis strains in some samples, which makes it difficult to correctly call single - nucleotide polymorphisms (SNPs). To overcome these problems, the author implemented a series of rescue strategies, including: - **Whole - genome amplification (WGA)**: Through the Phi29 - dependent whole - genome amplification technology, the double - stranded DNA concentration in the samples was successfully increased, enabling the recovery of samples that were originally unusable for WGS. - **Filtering Mycobacterium reads**: Before aligning the samples, non - Mycobacterium reads were filtered out to improve data quality. - **Isolating mixed - infection samples**: The SplitStrains tool was used to isolate different strains in mixed - infection samples, thereby recovering some originally unusable samples. These strategies not only enabled the recovery of many originally unusable samples but also revealed the steps that can be adjusted in the upstream procedures before WGS to reduce the occurrence of these problems, further promoting the integration of culture - based Mycobacterium bovis detection and whole - genome sequence analysis.