Early prediction of transmission clusters using supervised learning models

Omid Gheysar Gharamaleki,Caroline Colijn,Inna Sekirov,James C Johnston,Benjamin Sobkowiak
DOI: https://doi.org/10.1101/2024.04.16.24305900
2024-04-16
Abstract:Identifying individuals with tuberculosis with a high risk of onward transmission can guide disease prevention and public health strategies. Here, we train classification models to predict the first sampled isolates in transmission clusters from demographic and disease data. We find that supervised learning models, in particular balanced random forests, can be used to develop predictive models that discriminate between individuals with TB that are more likely to form transmission clusters and individuals that are likely not to transmit further, with good model performance and AUCs of ≥ 0.75. We also identified the most important patient and disease characteristics in the best performing classification model, including patient demographics, site of infection, TB lineage, and age at diagnosis. This framework can be used to develop predictive tools for the early assessment of a patient’s transmission risk to prioritise individuals for enhanced follow-up with the aim of reducing further transmission.
Infectious Diseases (except HIV/AIDS)
What problem does this paper attempt to address?