Exploring the impact of spatial autocorrelation on optimistic bias in cross-validation and assessing the effectiveness of spatial cross-validation

Musang Yoo,Hyeongmo Koo
DOI: https://doi.org/10.1080/15230406.2024.2422593
IF: 2.354
2024-11-28
Cartography and Geographic Information Science
Abstract:Spatial autocorrelation is a fundamental property of spatial data, which violates the assumption of independence between training and test datasets in general cross-validation (CV). Previous studies have reported strong positive spatial autocorrelation generally leads to optimistic biases in general CV results. Spatial CV methods have been developed to address this bias, but their effectiveness remains controversial owing to their potential for excessively pessimistic estimations. This study examines the impact of spatial autocorrelation on general CV results and validates the effectiveness of spatial CV. The first simulation explores the impact of varying spatial autocorrelation levels on the general CV results. Specifically, strong and moderate positive spatial autocorrelation introduces optimistic biases, whereas weak positive or negative spatial autocorrelations have no significant impact. The second simulation shows spatial CV methods can mitigate the optimistic biases in general CV results when dealing with spatial data having strong and moderate positive spatial autocorrelations. However, the hyperparameters of spatial CV should be adjusted based on the level of spatial autocorrelation to avoid excessively pessimistic estimations.
geography
What problem does this paper attempt to address?