Estimating the reproduction number and transmission heterogeneity from the size distribution of clusters of identical pathogen sequences

Cécile Tran-Kiem,Trevor Bedford
DOI: https://doi.org/10.1101/2023.04.05.23287263
2024-02-08
Abstract:Quantifying transmission intensity and heterogeneity is crucial to ascertain the threat posed by infectious diseases and inform the design of interventions. Methods that jointly estimate the reproduction number and the dispersion parameter have however mainly remained limited to the analysis of epidemiological clusters or contact tracing data, whose collection often proves difficult. Here, we show that clusters of identical sequences are imprinted by the pathogen offspring distribution, and we derive an analytical formula for the distribution of the size of these clusters. We develop and evaluate a novel inference framework to jointly estimate the reproduction number and the dispersion parameter from the size distribution of clusters of identical sequences. We then illustrate its application across a range of epidemiological situations. Finally, we develop a hypothesis testing framework relying on clusters of identical sequences to determine whether a given pathogen genetic subpopulation is associated with increased or reduced transmissibility. Our work provides new tools to estimate the reproduction number and transmission heterogeneity from pathogen sequences without building a phylogenetic tree, thus making it easily scalable to large pathogen genome datasets.
Epidemiology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to estimate the basic reproduction number \(R\) and the transmission heterogeneity parameter \(k\) from the size distribution of the same pathogen sequence clusters. Specifically, the authors propose a new inference framework that uses the size distribution of the same sequence clusters to estimate these key epidemiological parameters without the need to construct a phylogenetic tree. This method can better understand and evaluate the transmission characteristics of pathogens, especially in cases of acute infection and a narrow transmission bottleneck. In addition, this study also develops a hypothesis - testing framework for determining whether a specific pathogen genetic subgroup is associated with a higher or lower transmission capacity. This provides an important tool for evaluating the transmission advantages of different genetic variants. ### Background and Objectives of the Paper 1. **Quantifying Transmission Intensity and Heterogeneity** - Understanding the transmission intensity and heterogeneity of infectious diseases is crucial for assessing epidemic threats and designing intervention measures. - Common methods rely on epidemiological time - series data to estimate the basic reproduction number \(R\), but these methods usually cannot estimate the transmission heterogeneity parameter \(k\). 2. **Limitations of Existing Methods** - Most existing methods for jointly estimating \(R\) and \(k\) are mainly based on epidemiological cluster or contact - tracing data, and the collection of these data is often very difficult. - Although phylogenetic methods can estimate \(R\), they are computationally expensive when dealing with large - scale data sets and have lower statistical power in the early stages or in the presence of super - spreading events. 3. **Advantages of the New Method** - The authors propose a new statistical model to estimate \(R\) and \(k\) by analyzing the size distribution of the same sequence clusters. - This method does not require the construction of a phylogenetic tree, so it is easier to extend to large - scale pathogen genome data sets. ### Main Contributions 1. **Theoretical Basis** - It is proved that the size distribution of the same sequence clusters is affected by the distribution of disease offspring. - The analytical formula for the size distribution of the same sequence clusters is derived, showing the influence of \(R\) and \(k\) on this distribution. 2. **Inference Framework** - A maximum - likelihood estimation method is developed to jointly estimate \(R\) and \(k\) from the size distribution of the same sequence clusters. - The performance of this method in different epidemiological situations is evaluated, including testing on simulated data of different pathogens. 3. **Application Cases** - This method is applied to analyze the genome data of MERS, measles and SARS - CoV - 2, verifying its effectiveness and accuracy in actual epidemics. - By analyzing the transmission of SARS - CoV - 2 variants in Washington State, it is shown how to use this method to monitor the transmission advantages of different genetic variants. ### Conclusion This study provides a new tool that can directly estimate the basic reproduction number \(R\) and the transmission heterogeneity parameter \(k\) from pathogen genome data without the need for complex phylogenetic tree construction. This method has high scalability and computational efficiency when dealing with large - scale data sets, providing strong support for better understanding the transmission characteristics of pathogens and evaluating the effectiveness of control measures.