Spatial bagging to integrate spatial correlation into ensemble machine learning

Fehmi Özbayrak,John T. Foster,Michael J. Pyrcz
DOI: https://doi.org/10.1016/j.cageo.2024.105558
IF: 5.168
2024-02-15
Computers & Geosciences
Abstract:We propose a novel spatial bagging workflow for predictive ensemble machine learning that improves on standard bagging models. Our proposed method integrates spatial bootstrap for bagging with the number of effective sample size, neff , for integration of the spatial context of the dataset. We benchmark the improved performance over standard machine learning bagging models with a large number of two-dimensional synthetic datasets with varying degrees of Gaussian noise. For noise free datasets, both methods demonstrate equivalent accuracy; however, spatial bagging achieves this with a significantly smaller sample size, showcasing its improved efficiency. As data noise increases, spatial bagging consistently outperforms standard bagging, displaying an improved Mean Squared Error (MSE) and robustness against overfitting. Our proposed spatial bagging method computes the optimal effective sample size for spatial data, reducing model overfitting. Furthermore, our proposed method requires only the additional step of variogram calculation and modeling, and can be implemented with any predictive machine learning bagging model with minimal code modification. i.e., specification of the number of bootstrap samples as the number of effective data. We recommend using spatial bagging for improved predictions for any spatial data setting across diverse scientific fields, e.g., atmospheric, agricultural, subsurface resources etc.
geosciences, multidisciplinary,computer science, interdisciplinary applications
What problem does this paper attempt to address?