Harmful algal bloom forecasting. A comparison between stream and batch learning
Andres Molares-Ulloa,Elisabet Rocruz,Daniel Rivero,Xosé A. Padin,Rita Nolasco,Jesús Dubert,Enrique Fernandez-Blanco
2024-02-20
Abstract:Diarrhetic Shellfish Poisoning (DSP) is a global health threat arising from
shellfish contaminated with toxins produced by dinoflagellates. The condition,
with its widespread incidence, high morbidity rate, and persistent shellfish
toxicity, poses risks to public health and the shellfish industry. High biomass
of toxin-producing algae such as DSP are known as Harmful Algal Blooms (HABs).
Monitoring and forecasting systems are crucial for mitigating HABs impact.
Predicting harmful algal blooms involves a time-series-based problem with a
strong historical seasonal component, however, recent anomalies due to changes
in meteorological and oceanographic events have been observed. Stream Learning
stands out as one of the most promising approaches for addressing
time-series-based problems with concept drifts. However, its efficacy in
predicting HABs remains unproven and needs to be tested in comparison with
Batch Learning. Historical data availability is a critical point in developing
predictive systems. In oceanography, the available data collection can have
some constrains and limitations, which has led to exploring new tools to obtain
more exhaustive time series. In this study, a machine learning workflow for
predicting the number of cells of a toxic dinoflagellate, Dinophysis acuminata,
was developed with several key advancements. Seven machine learning algorithms
were compared within two learning paradigms. Notably, the output data from
CROCO, the ocean hydrodynamic model, was employed as the primary dataset,
palliating the limitation of time-continuous historical data. This study
highlights the value of models interpretability, fair models comparison
methodology, and the incorporation of Stream Learning models. The model DoME,
with an average R2 of 0.77 in the 3-day-ahead prediction, emerged as the most
effective and interpretable predictor, outperforming the other algorithms.
Machine Learning,Artificial Intelligence