A comparison of the value of two machine learning predictive models to support bovine tuberculosis disease control in England

M Pilar Romero,Yu-Mei Chang,Lucy A Brunton,Alison Prosser,Paul Upton,Eleanor Rees,Oliver Tearne,Mark Arnold,Kim Stevens,Julian A Drewe
DOI: https://doi.org/10.1016/j.prevetmed.2021.105264
Abstract:Nearly a decade into Defra's current eradication strategy, bovine tuberculosis (bTB) remains a serious animal health problem in England, with c.30,000 cattle slaughtered annually in the fight against this insidious disease. There is an urgent need to improve our understanding of bTB risk in order to enhance the current disease control policy. Machine learning approaches applied to big datasets offer a potential way to do this. Regularized regression and random forest machine learning methodologies were implemented using 2016 herd-level data to generate the best possible predictive models for a bTB incident in England and its three surveillance risk areas (High-risk area [HRA], Edge area [EA] and Low-risk area [LRA]). Their predictive performance was compared and the best models in each area were used to characterize herds according to risk. While all models provided excellent discrimination, random forest models achieved the highest balanced accuracy (i.e. average of sensitivity and specificity) in England, HRA and LRA, whereas the regularized regression LASSO model did so in the EA. The time since the last confirmed incident was resolved was the only variable in the top-ten ranking in all areas according to both types of models, which highlights the importance of bTB history as a predictor of a new incident. Risk categorisation based on Receiver Operating Characteristic (ROC) analysis was carried out using the best predictive models in each area setting a 99 % threshold value for sensitivity and specificity (97 % in the LRA). Thirteen percent of herds in the whole of England as well as in its HRA, 14 % in its EA and 31 % in its LRA were classified as high-risk. These could be selected for the deployment of additional disease control measures at national or area level. In this way, low-risk herds within the area considered would not be penalised unnecessarily by blanket control measures and limited resources be used more efficiently. The methodology presented in this paper demonstrates a way to accurately identify high-risk farms to inform a targeted disease control and prevention strategy in England that supplements existing population strategies.
What problem does this paper attempt to address?