Performance of Decision-Tree-Based Ensemble Classifiers in Predicting Fog Frequency in Ungauged Areas

Daeha Kim,Eunhee Kim,Eunji Kim
DOI: https://doi.org/10.1175/waf-d-23-0024.1
2023-11-01
Weather and Forecasting
Abstract:Abstract Fog is a phenomenon that exerts significant impacts on transportation, aviation, air quality, agriculture, and even water resources. While data-driven machine learning algorithms have shown promising performance in capturing nonlinear fog events at point locations, their applicability to different areas and time periods is questionable. This study addresses this issue by examining five decision-tree-based classifiers in a South Korean region, where diverse fog formation mechanisms are at play. The five machine learning algorithms were trained at point locations and tested with other point locations for time periods independent of the training processes. Using the ensemble classifiers and high-resolution atmospheric reanalysis data, we also attempted to establish fog occurrence maps in a regional area. Results showed that machine learning models trained on the local datasets exhibited superior performance in mountainous areas, where radiative cooling predominantly contributes to fog formation, compared to inland and coastal regions. As the fog generation mechanisms diversified, the tree-based ensemble models appeared to encounter challenges in delineating their decision boundaries. When they were trained with the reanalysis data, their predictive skills were significantly decreased, resulting in high false alarm rates. This prompted the need for postprocessing techniques to rectify overestimated fog frequency. While postprocessing may ameliorate overestimation, caution is needed to interpret the resultant fog frequency estimates, especially in regions with more diverse fog generation mechanisms. The spatial upscaling of machine learning–based fog prediction models poses challenges owing to the intricate interplay of various fog formation mechanisms, data imbalances, and potential inaccuracies in reanalysis data.
meteorology & atmospheric sciences
What problem does this paper attempt to address?