Accurate prediction of spatial distribution of soil potentially toxic elements using machine learning and associated key influencing factors identification: A case study in mining and smelting area in southwestern China

Kai Li,Guanghui Guo,Degang Zhang,Mei Lei,Yingying Wang
DOI: https://doi.org/10.1016/j.jhazmat.2024.135454
2024-08-07
Abstract:Accurate prediction of spatial distribution of potentially toxic elements (PTEs) is crucial for soil pollution prevention and risk control. Achieving accurate prediction of spatial distribution of soil PTEs at a large scale using conventional methods presents significant challenges. In this study, machine learning (ML) models, specially artificial neural network (ANN), random forest (RF), and extreme gradient boosting (XGB), were used to predict spatial distribution of soil PTEs and identify associated key factors in mining and smelting area located in Yunnan Province, China, under the three scenarios: (1) natural + socioeconomic + spatial datasets (NS), (2) NS + irrigation pollution index (IPI) datasets, (3) NS + IPI + deposition (DEPO) datasets. The results highlighted the combination of NS+IPI+DEPO yielded the highest predictive accuracy across ML models. Particularly, XGB exhibited the highest performance for As (R2 =0.7939), Cd (R2 =0.6679), Cu (R2 =0.8519), Pb (R2 =0.8317), and Zn (R2 =0.7669), whereas RF performed the best for Ni (R2 =0.7146). The feature importance and Shapley additive explanation (SHAP) analysis revealed that DEPO and IPI were the pivotal factors influencing the distribution of soil PTEs. Our findings highlighted the important role of DEPO in spatial distribution prediction of soil PTEs, which has often been ignored in previous studies.
What problem does this paper attempt to address?