Socioeconomic and environmental factors of poverty in China using geographically weighted random forest regression model

Yaowen Luo,Jianguo Yan,Stephen C. McClure,Fei Li
DOI: https://doi.org/10.1007/s11356-021-17513-3
IF: 5.8
2022-01-13
Environmental Science and Pollution Research
Abstract:Correlations between socioeconomic factors and poverty in regression models do not reflect actual relationships, especially when data exhibit patterns of spatial heterogeneity. Spatial regression models can estimate the relationships between socioeconomic factors and poverty in defined geographical areas, explaining the imbalanced distribution of poverty, but the relationships between these factors and poverty are not always linear however, and conventional simple linear local regression models do not accurately capture these nonlinear relationships. To fill this gap, we used a local regression method, geographically weighted random forest regression (GW-RFR), that integrates a spatial weight matrix (SWM) and random forest (RF). The GW-RFR evaluates the spatial variations in the nonlinear relationships between variables. A county-level poverty data set of China was employed to estimate the performance of the GW-RFR against the random forest (RF). In this poverty application, the value of R2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${R}^{2}$$\end{document} was 0.128 higher than that of the RF, the NRMSE value was 1.6% lower than the RF, and the MAE value was 0.295 lower than the RF. These results showed that the relationship between poverty factors and poverty varies with space at the county level in China, and the GW-RFR was suitable for dealing with nonlinear relationships in local regression analysis.
environmental sciences
What problem does this paper attempt to address?