Simulation and prediction research of heavy metal pollution in soil of typical sites based on multi-source heterogeneous data
XU Yang,CHEN Jiansong,WANG Zhidong,JIANG Fangming,ZHANG Qingyu,TANG Kuo,JIANG Hongqiang,DENG Jinsong
DOI: https://doi.org/10.13671/j.hjkxxb.2023.0090
2023-01-01
Abstract:Soil is considered the fundamental factor involved in the survival and development of human beings. The risk of heavy metal pollution in soil has been also increasing yearly along with the continuous advancement of industrialization and urbanization in China. In order to study the scientific perspective of heavy metal pollution prediction of site soil based on non-sampling multi-source heterogeneous data as well as exploring the importance and influence rules of different variables in different heavy metal pollution, 104 typical polluted sites in Zhejiang Province have been selected as the research objects. The site soil heavy metal pollution risk identification index system has been constructed based on the principle of mass balance, including four major variables of enterprise production, atmospheric deposition, plant enrichment, and soil leaching along with twenty Secondary variables. Random Forest and Feature Importance analyzing methods have been used to establish prediction models and explore the importance of variables using detailed site soil pollution survey results. The results have shown that:(1) The mean value of the determination coefficient(R mean ~2=0.75) of the prediction model based on the site mean value of Random Forest has been found higher than that of the models based on the site maximum value(R mean ~2=0.62). The predicting ability of the model method towards the mean values of heavy metal pollution in sites has been found better than that of the maximum values in sites. The preciseness of the prediction models based on the mean values has been ranked as nickel, mercury, cadmium, lead, copper and arsenic from high to low, while the rank based on the maximum values is cadmium, nickel, arsenic, lead, copper and mercury from high to low. The determination coefficient of the prediction model based on the Nemerow comprehensive pollution index(R~2=0.84) has been found higher than the average level of the single heavy metal prediction models and lower than the mean-based prediction models of nickel(R~2=0.92) and mercury(R~2=0.91);(2) The model has shown poor predicting ability for extremely high values, and the multi-source heterogeneous database has shown weak representing ability for such samples, which could be the main reason why the overall accuracy of the model based on the maximum value has been found lower than that of the model based on the mean value;(3)Based on the average importance of the characteristics, the industrial-environmental impact grade(15.57%), the proportion of soil clay particles(9.55%), the highway density(8.44%), the enterprise operation time(7.31%), the solar radiation intensity(7.06%), and the enterprise establishment time(6.56%) have been addressed as the variable indicators closely related to soil heavy metal pollution. The purpose of this study has been set to provide a new approach to the analysis of heavy metal pollution in soil, which includes using multi-source heterogeneous data to identify potential areas where heavy metal pollution risks may exist, analyzing the important relationship between different heavy metal elements and feature variables, and providing more accurate information for the investigation of heavy metal pollution in soil and environmental protection.