Abstract:In recent times, pathogen genome sequencing has become increasingly used to investigate infectious disease outbreaks. When genomic data is sampled densely enough amongst infected individuals, it can help resolve who infected whom. However, transmission analysis cannot rely solely on a phylogeny of the genomes but must account for the within-host evolution of the pathogen, which blurs the relationship between phylogenetic and transmission trees. When only a single genome is sampled for each host, the uncertainty about who infected whom can be quite high. Consequently, transmission analysis based on multiple genomes of the same pathogen per host has a clear potential for delivering more precise results, even though it is more laborious to achieve. Here we present a new methodology that can use any number of genomes sampled from a set of individuals to reconstruct their transmission network. Furthermore, we remove the need for the assumption of a complete transmission bottleneck. We use simulated data to show that our method becomes more accurate as more genomes per host are provided, and that it can infer key infectious disease parameters such as the size of the transmission bottleneck, within-host growth rate, basic reproduction number and sampling fraction. We demonstrate the usefulness of our method in applications to real datasets from an outbreak of Pseudomonas aeruginosa amongst cystic fibrosis patients and a nosocomial outbreak of Klebsiella pneumoniae .
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use multiple genomes of each host to more accurately reconstruct the transmission network in pathogen transmission analysis without assuming a complete transmission bottleneck. Specifically, the paper proposes an extended TransPhylo model, which can handle multiple genome samples of each host, thereby more accurately inferring who infects whom, and at the same time can estimate key infectious disease parameters, such as the size of the transmission bottleneck, the growth rate within the host, the basic reproduction number, and the sampling proportion.
### Main problems
1. **Improve the accuracy of transmission analysis**:
- When there is only one genome sample per host, the uncertainty about who infects whom is very high. Therefore, using multiple genome samples per host can significantly improve the accuracy of transmission analysis.
2. **Remove the assumption of complete transmission bottleneck**:
- The traditional TransPhylo model assumes that only one lineage passes through each transmission (i.e., complete transmission bottleneck). However, this assumption does not hold for many pathogens. Therefore, the paper proposes a new method that allows for partial transmission bottlenecks, making the model more general.
3. **Integrate within - host diversity**:
- The pathogen diversity within the host will affect the results of transmission analysis. The method proposed in the paper can integrate multiple genome samples of each host, thereby better reflecting the evolutionary process within the host.
### Solutions
- **Extended TransPhylo model**:
- This model can handle multiple genome samples of each host, thereby more accurately reconstructing the transmission network.
- The model removes the assumption of complete transmission bottleneck and allows for partial transmission bottlenecks, thus being more in line with the actual situation.
- A linear growth model is introduced to describe the dynamic changes of pathogens within the host, which improves the flexibility and accuracy of the model.
- **Bayesian inference**:
- Use the Markov chain Monte Carlo (MCMC) method for Bayesian inference to estimate model parameters and the transmission network.
### Experimental results
- **Simulated data**:
- The paper verifies the effectiveness of the new method through simulated data. The results show that as the number of genome samples per host increases, the reconstruction accuracy of the model for the transmission network is significantly improved.
- The new method can more accurately estimate key infectious disease parameters, such as the basic reproduction number, the sampling proportion, the growth rate within the host, and the size of the transmission bottleneck.
- **Practical applications**:
- The paper applies the new method to actual data sets, including the transmission of Pseudomonas aeruginosa in cystic fibrosis patients and the outbreak of Klebsiella pneumoniae in hospitals. The results show that the new method also performs well in practical applications and can more accurately infer the transmission path and estimate key parameters.
In conclusion, this paper solves the limitations of traditional methods in handling multiple genome samples per host and partial transmission bottlenecks by proposing a new extended TransPhylo model, thereby significantly improving the accuracy and reliability of pathogen transmission analysis.