Improving the Efficiency of Machine Learning in Simulating Sedimentary Heavy Metal Contamination by Coupling Preposing Feature Selection Methods

Ligang Deng,Xiang Gao,Bisheng Xia,Jinhua Wang,Qianying Dai,Yifan Fan,Siyuan Wang,Huiming Li,Xin Qian
DOI: https://doi.org/10.1016/j.chemosphere.2023.138205
IF: 8.8
2023-01-01
Chemosphere
Abstract:Sediment cores were collected from Taihu Lake in China. The chronology was determined by radionuclide. Heavy metals and magnetic properties of each core slice were assessed, respectively. The concentrations of most heavy metals in sediments surged at 20 cm from the surface, accompanying the increase in the concentrations of single-domain magnetic particles. This may be resulted from the influence of anthropic activities on the lake's environment after the 1970s. Two feature selection methods, random forest (RF) and maximal information co-efficient (MIC), were combined with support vector machine (SVM) model to simulate heavy metals, with the inclusion of selected magnetic and physicochemical parameters. Compared with the modeling results obtained with the full set of parameters, a reasonable simulation performance was obtained with RF and MIC. RF per-formed better than MIC by increasing the R2 of simulation models for Cd, Cr, Cu, Pb, and Sb. For heavy metals with high ecological risks (As, Cd, Cr, Hg, Pb, Sb), the correlation coefficients for observed and predicted data ranged from 0.73 to 0.97 with only 14-27% of the parameters selected by RF as input variables. The RF-RBF-SVM enabled heavy metal predictions based on the magnetic properties of the lake sediments.
What problem does this paper attempt to address?