Ecological Dissimilarity Matters More Than Geographical Distance When Predicting Land Surface Indicators Using Machine Learning

Bo Zhou,Gregory S. Okin,Junzhe Zhang,Shannon L. Savage,Christopher J. Cole,Michael C. Duniway
DOI: https://doi.org/10.1109/tgrs.2024.3404240
IF: 8.2
2024-06-07
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Supervised training techniques, such as those used in machine learning, use generally large sets of in situ data to train models that can, in turn, be used to make predictions (or prediction maps) about the Earth's surface in times or places where no in situ data exist. The purpose of the present study is to investigate, using a very large set of in situ data from across the western United States (U.S.), the conditions under which training data from a different geographic region where predictions are desired may be substituted. To do this, we train models using in situ data from level IV ecoregions and test how well these models predict surface conditions in different ecoregions. We characterize the difference between the possible pairs of ecoregion in terms of geographical (centroid-to-centroid) distance and "ecological dissimilarity." Ecological dissimilarity between pairs of ecoregions is defined in two ways: 1) as the Euclidean distance in multivariate space defined by in situ indicators designed for monitoring purposes and 2) in terms of the difference in temporal behavior from model- and remote sensing-derived datasets. Although, overall, prediction error increases with geographical distance between training and testing ecoregions, our results indicate that ecological dissimilarity can be used to predict the error expected from a model trained with data from one ecoregion when applied in a different ecoregion.
engineering, electrical & electronic,imaging science & photographic technology,remote sensing,geochemistry & geophysics
What problem does this paper attempt to address?