Abstract:Satellite imagery has been widely used to map urbanization processes. To address the urgent need for urban landscape mapping that goes beyond urban footprint analysis, the local climate zone (LCZ) scheme has been increasingly used to reveal the urban forms and functions important to urban heat islands and micro-climates across the globe. As with most supervised classification strategies, proper application of training data is critical for the success of LCZ classification models. However, the collection and application of LCZ training areas brings with it two challenges that may affect mapping success. First, because digitizing training areas is a time-consuming task, there is a broad effort in the LCZ mapping community to create a crowdsourced data collection among different experts. However, this strategy likely leads to inconsistencies in labels that could weaken models. Second, the LCZ labeling process typically involves the delineation of large zones from which multiple training samples are drawn, but those samples are likely spatially autocorrelated and lead to overly optimistic estimates of model accuracy. Although both effects -- inconsistent labeling and spatial autocorrelation -- are theoretically possible, it is unknown whether they substantially affect accuracy. We investigated both issues, specifically asking: (i) how do the discrepancies of LCZ labeling by different experts impact broad-scale LCZ mapping? (ii) to what extent does spatial correlation affect model prediction power? We used two classifiers (Random Forests and ResNets) to map eight metropolitan areas in the US into LCZs, comparing training areas drawn by different or consistent interpreters, and data splitting strategy using rules that allow or reduce spatial autocorrelation. We found large discrepancies among results built from crowdsourced training areas digitized by different experts; improving the consistency of labels can lead to substantial improvements in LCZ classification accuracy. Second, we found that spatial autocorrelation can boost the apparent accuracy of the classifier by 16% to 21%, leading to erroneous interpretation of mapping results. The two effects interplay as well: spatial autocorrelation in the raw data can lead to an underestimation of the model's predictive error when modeling with crowdsourced training areas of high inconsistency. Due to the uncertainty in the labeling process and spatial autocorrelation in derived training data, broad-scale LCZ mapping results should be interpreted with caution.

Classifying Natural-Language Spatial Relation Terms with Random Forest Algorithm

Interpreting the Fuzzy Semantics of Natural-Language Spatial Relation Terms with the Fuzzy Random Forest Algorithm

Very High Resolution Remote Sensing Imagery Classification Using a Fusion of Random Forest and Deep Learning Technique—Subtropical Area for Example

A Truly Spatial Random Forests Algorithm for Geoscience Data Analysis and Modelling

Semantic Classification of Urban Buildings Combining Vhr Image and Gis Data: an Improved Random Forest Approach

GeoRF: a geospatial random forest

Examining the Spatially Varying Relationships between Landslide Susceptibility and Conditioning Factors Using a Geographical Random Forest Approach: A Case Study in Liangshan, China

Mapping China’s Regional Economic Activity by Integrating Points-of-interest and Remote Sensing Data with Random Forest

Local Population Mapping Using a Random Forest Model Based on Remote and Social Sensing Data: A Case Study in Zhengzhou, China.

A Population Spatialization Model at the Building Scale Using Random Forest

Spatial Factor Models for High-Dimensional and Large Spatial Data: An Application in Forest Variable Mapping

Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables

Random Spatial Forests

Application of training data affects success in broad-scale local climate zone mapping

Understanding the spatial dimension of natural language by measuring the spatial semantic similarity of words through a scalable geospatial context window

The Random Forest-Based Method of Fine-Resolution Population Spatialization by Using the International Space Station Nighttime Photography and Social Sensing Data

Spatial effects analysis of natural forest canopy cover based on spaceborne LiDAR and geostatistics

Spatial Simulation Modeling of Settlement Distribution Driven by Random Forest: Consideration of Landscape Visibility

A path in regression Random Forest looking for spatial dependence: a taxonomy and a systematic review

Mapping Urban Areas Using A Combination Of Remote Sensing And Geolocation Data

Random Forest Weighted Local Fréchet Regression with Random Objects