Incorporating spatial autocorrelation into house sale price prediction using random forest model

Lan Hu,Yongwan Chun,Daniel A. Griffith
DOI: https://doi.org/10.1111/tgis.12931
IF: 2.568
2022-04-29
Transactions in GIS
Abstract:The random forest model has been frequently used for house sale price prediction because researchers have shown that it can create better results with smaller prediction errors than conventional statistical models. However, house sale prices tend to be spatially autocorrelated due to shared neighborhood environments and similar physical characteristics, which have been rarely considered and accounted for in the random forest model. A failure to address spatial autocorrelation may result in over‐ or underpredicted house sale prices, especially when these sale prices are highly spatially dependent. This research proposes an extended random forest approach, introducing additional proxy variables that furnish geographic proximity measures of house locations in order to capture spatial autocorrelation that is unexplained by covariates. An application of the extended random forest approach for house sale prices in Fairfax County, Virginia, confirms the effectiveness of the proxy variables with an improvement of house sale price prediction accuracy. The introduction of the proxy variables can correct over‐ and under‐predicted house sale prices that show an underlying spatial structure.
geography
What problem does this paper attempt to address?