ScTree: Scalable and robust mechanistic integration of epidemiological and genomic data for transmission tree inference
Hannah Waddel,Katia Koelle,Max SY Lau
Abstract:Phylodynamic models capture joint epidemiological-evolutionary dynamics during an outbreak, providing a powerful tool to enhance understanding and management of disease transmission. Existing phylodynamic approaches, however, mostly rely on various non-mechanistic or semi-mechanistic approximations of the underlying epidemiological-evolutionary process. Previous work by Lau et al. [1] has shown that full Bayesian mechanistic models, without relying on these approximations, can enable highly accurate joint inference of the epidemiological-evolutionary dynamics including the unobserved transmission tree [1,2]. However, the Lau method faces major computational bottlenecks. As the volume of genomic data collected during outbreaks continues to grow, it is crucial to develop scalable yet accurate phylodynamic methods. Here we propose a new Bayesian phylodynamic model, overcoming the major scalability issue in the Lau 2015 method and enabling a readily deployable, yet accurate, phylodynamic modeling framework. Specifically, we develop a scalable spatiotemporal phylodynamic framework for inferring the transmission tree (ScTree) and other key epidemiological parameters considering the infinite sites assumption in modeling mutation on the sequence level, in contrast to Lau 2015 in which mutation was modeled explicitly on the nucleotide level. Our approach features full Bayesian implementation utilizing a realistic likelihood to mechanistically integrate epidemiological and evolutionary processes. We develop a computationally-efficient data-augmentation Markov Chain Monte Carlo algorithm, inferring key model parameters and unobserved dynamics including the transmission tree. We assess performance of our method using multiple simulated outbreak data. Our results indicate that our method can achieve high inference accuracy, comparable to performance of Lau 2015 method. Additionally, our method scales significantly more efficiently for large outbreaks, with computing time increasing linearly with outbreak size, compared to the exponential scaling of the Lau method. We also demonstrate our method's utility by applying our validated modeling framework to a dataset describing a foot-and-mouth disease outbreak in the UK [3]. Our results show that our method is able to generate estimates of the transmission dynamics consistent with those from the Lau 2015 method, further demonstrating the robustness of our new approach. In summary, our method provides a computationally-efficient, highly scalable, accurate modeling framework for inferring the joint spatiotemporal dynamics of epidemiological and evolutionary processes, facilitating timely and effective outbreak responses in space and time. Our method is implemented in our R package ScTree.