Imputing environmental impact missing data of the industrial sector for Chinese cities: A machine learning approach

Xi Chen,Chenyang Shuai,Bu Zhao,Yu Zhang,Kaijian Li
DOI: https://doi.org/10.1016/j.eiar.2023.107050
IF: 6.122
2023-02-10
Environmental Impact Assessment Review
Abstract:Data are the lifeblood of evidence-based decision-making and the raw material for accountability. Collecting data to regularly evaluate industrial consumption and pollution at the city level is not an easy task, which needs a significant investment of institutional and financial resources and engagement with a vast number of local governments. Despite the Chinese government putting extensive human and financial resources into data collection, there are still substantial data gaps. This study compared two traditional linear models and four machine learning models to computationally estimate missing data of six industrial consumption and pollution indicators (responses) of 701 cities from 2006 to 2018 with ten predictors. Results showed that a decision-tree based extreme gradient boosting model developed performed best among the six models. The median values of coefficient of determination (R 2 ) and root mean squared error of six responses ranged between 0.85 and 0.94 and 8.5 to 17,776, respectively. This study provided high-quality and detailed data for industrial environmental analysis of Chinese cities. In addition, the extreme gradient boosting model could be adapted to impute the missing data for other environmental variables of other sectors and at an even smaller scale given its good generalization ability.
environmental studies
What problem does this paper attempt to address?