ScTree: Scalable and robust mechanistic integration of epidemiological and genomic data for transmission tree inference

Hannah Waddel,Katia Koelle,Max SY Lau
DOI: https://doi.org/10.1101/2024.11.20.624621
2024-11-21
Abstract:Phylodynamic models capture joint epidemiological-evolutionary dynamics during an outbreak, providing a powerful tool to enhance understanding and management of disease transmission. Existing phylodynamic approaches, however, mostly rely on various non-mechanistic or semi-mechanistic approximations of the underlying epidemiological-evolutionary process. Previous work by Lau et al. [1] has shown that full Bayesian mechanistic models, without relying on these approximations, can enable highly accurate joint inference of the epidemiological-evolutionary dynamics including the unobserved transmission tree [1,2]. However, the Lau method faces major computational bottlenecks. As the volume of genomic data collected during outbreaks continues to grow, it is crucial to develop scalable yet accurate phylodynamic methods. Here we propose a new Bayesian phylodynamic model, overcoming the major scalability issue in the Lau 2015 method and enabling a readily deployable, yet accurate, phylodynamic modeling framework. Specifically, we develop a scalable spatiotemporal phylodynamic framework for inferring the transmission tree (ScTree) and other key epidemiological parameters considering the infinite sites assumption in modeling mutation on the sequence level, in contrast to Lau 2015 in which mutation was modeled explicitly on the nucleotide level. Our approach features full Bayesian implementation utilizing a realistic likelihood to mechanistically integrate epidemiological and evolutionary processes. We develop a computationally-efficient data-augmentation Markov Chain Monte Carlo algorithm, inferring key model parameters and unobserved dynamics including the transmission tree. We assess performance of our method using multiple simulated outbreak data. Our results indicate that our method can achieve high inference accuracy, comparable to performance of Lau 2015 method. Additionally, our method scales significantly more efficiently for large outbreaks, with computing time increasing linearly with outbreak size, compared to the exponential scaling of the Lau method. We also demonstrate our method's utility by applying our validated modeling framework to a dataset describing a foot-and-mouth disease outbreak in the UK [3]. Our results show that our method is able to generate estimates of the transmission dynamics consistent with those from the Lau 2015 method, further demonstrating the robustness of our new approach. In summary, our method provides a computationally-efficient, highly scalable, accurate modeling framework for inferring the joint spatiotemporal dynamics of epidemiological and evolutionary processes, facilitating timely and effective outbreak responses in space and time. Our method is implemented in our R package ScTree.
Biology
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the computational efficiency and accuracy issues of existing epidemiological - evolutionary dynamics models (phylodynamic models) when dealing with large - scale epidemic data. Specifically: 1. **Computational efficiency problem**: Many existing epidemiological - evolutionary dynamics methods rely on non - mechanistic or semi - mechanistic approximations. Although these approximations simplify the calculations, they limit the accuracy and interpretability of the models. In particular, the method proposed by Lau et al. in 2015, although it can provide high - precision joint inferences (including unobserved transmission trees), faces a serious computational bottleneck when dealing with large - scale data, and the computation time increases exponentially as the scale of the epidemic increases. 2. **Model accuracy problem**: In order to improve the computational efficiency of the model, many methods adopt two - stage inferences, that is, first estimate the phylogenetic tree, and then estimate the transmission tree based on the fixed phylogenetic tree. This approach assumes that the phylogenetic tree does not depend on the transmission tree, which is inconsistent with the actual situation and may lead to difficulties in the systematic inference and interpretation of certain epidemiological parameters. To solve these problems, the paper proposes a new Bayesian epidemiological - evolutionary dynamics model - ScTree. ScTree improves existing methods in the following ways: - **Infinite - sites assumption**: Adopt the infinite - sites assumption to describe the evolutionary process, assuming that mutations at each nucleotide site occur only once and do not reverse. This method avoids the computational complexity brought by modeling nucleotide sites one by one, and significantly improves the computational efficiency. - **Efficient MCMC algorithm**: Develop an efficient data - augmentation Markov Chain Monte Carlo (MCMC) algorithm, which can efficiently explore the high - dimensional parameter space while maintaining the accuracy of the model, and infer key model parameters and unobserved dynamic processes, including the transmission tree. - **Linear scalability**: The computation time of ScTree increases linearly as the scale of the epidemic increases, while the computation time of the Lau 2015 method increases exponentially. This gives ScTree a significant computational advantage when dealing with large - scale epidemic data. Through the application of simulated data sets and actual epidemic data, the paper verifies that ScTree significantly improves the computational efficiency while maintaining high precision, and is suitable for real - time epidemic response.