Abstract:Spatiotemporal dengue forecasting using machine learning (ML) can contribute to the development of prevention and control strategies for impending dengue outbreaks. However, training data for dengue incidence may be inflated with frequent zero values because of the rarity of cases, which lowers the prediction accuracy. This study aimed to understand the influence of spatiotemporal resolutions of data on the accuracy of dengue incidence prediction using ML models, to understand how the influence of spatiotemporal resolution differs between quantitative and qualitative predictions of dengue incidence, and to improve the accuracy of dengue incidence prediction with zero-inflated data. We predicted dengue incidence at six spatiotemporal resolutions and compared their prediction accuracy. Six ML algorithms were compared: generalized additive models, random forests, conditional inference forest, artificial neural networks, support vector machines and regression, and extreme gradient boosting. Data from 2009 to 2012 were used for training, and data from 2013 were used for model validation with quantitative and qualitative dengue variables. To address the inaccuracy in the quantitative prediction of dengue incidence due to zero-inflated data at fine spatiotemporal scales, we developed a hybrid approach in which the second-stage quantitative prediction is performed only when/where the first-stage qualitative model predicts the occurrence of dengue cases. At higher resolutions, the dengue incidence data were zero-inflated, which was insufficient for quantitative pattern extraction of relationships between dengue incidence and environmental variables by ML. Qualitative models, used as binary variables, eased the effect of data distribution. Our novel hybrid approach of combining qualitative and quantitative predictions demonstrated high potential for predicting zero-inflated or rare phenomena, such as dengue. Our research contributes valuable insights to the field of spatiotemporal dengue prediction and provides an alternative solution to enhance prediction accuracy in zero-inflated data where hurdle or zero-inflated models cannot be applied. In our study, we tackled the complex challenge of predicting dengue fever outbreaks, a crucial task in the field of epidemiology. Dengue prediction is complicated because it relies on the quality of data, which may be affected by the temporal and spatial resolution. We explored different machine learning algorithms across various spatial (village, city and region) and temporal resolutions (weekly and monthly). A key hurdle we encountered was the high frequency of zero values in reported dengue cases, a common issue known as zero-inflated data. This phenomenon makes accurate predictions difficult, especially at finer resolutions. To overcome this obstacle, we first made qualitative predictions about the presence or absence of dengue cases. Then, in scenarios indicating disease presence, we estimated the magnitude of cases quantitatively. This innovative method we designated as hybrid approach and significantly enhanced prediction accuracy in zero-inflated data. This approach can be applied to continuous data where zero-inflated or hurdle models cannot be applied. Our findings have broader implications beyond dengue prediction, shedding light on the challenges of dealing with zero-inflated data in various real-world situations. By improving our understanding of these complexities, our research contributes valuable insights that not only benefit scientists working in epidemiology but also have practical applications in public health strategies ensuring more effective and targeted interventions.

Spatiotemporal models of dengue epidemiology in the Philippines: Integrating remote sensing and interpretable machine learning

Interdisciplinary modelling and forecasting of dengue

Modeling Dengue Vector Population Using Remotely Sensed Data and Machine Learning

Exploring Dengue Dynamics: A Multi-Scale Analysis of Spatio-Temporal Trends in Ibagué, Colombia

Current and lagged associations of meteorological variables and Aedes mosquito indices with dengue incidence in the Philippines

Time-series modelling of dengue incidence in the Mekong Delta region of Viet Nam using remote sensing data

Spatiotemporal analysis of historical records (2001-2012) on dengue fever in Vietnam and development of a statistical model for forecasting risk

Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction

Improving Disease Outbreak Forecasting Models for Efficient Targeting of Public Health Resources

A reproducible ensemble machine learning approach to forecast dengue outbreaks

Assessing dengue forecasting methods: A comparative study of statistical models and machine learning techniques in Rio de Janeiro, Brazil

Ensemble Approaches for Robust and Generalizable Short-Term Forecasts of Dengue Fever. A retrospective and prospective evaluation study in over 180 locations around the world

SARIMA Forecasts of Dengue Incidence in Brazil, Mexico, Singapore, Sri Lanka, and Thailand: Model Performance and the Significance of Reporting Delays

Development of Data-driven Machine Learning Models and their Potential Role in Predicting Dengue outbreak

Space-Time Conditional Autoregressive Modeling to Estimate Neighborhood-Level Risks for Dengue Fever in Cali, Colombia

Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

Machine learning for improved dengue diagnosis, Puerto Rico

Improving Dengue Forecasts by Using Geospatial Big Data Analysis in Google Earth Engine and the Historical Dengue Information-Aided Long Short Term Memory Modeling

Bayesian spatio-temporal analysis of dengue transmission in Lao PDR

Precision Prediction for Dengue Fever in Singapore: A Machine Learning Approach Incorporating Meteorological Data

Artificial Intelligence Approach for Severe Dengue Early Warning System