Prediction of arsenic and fluoride in groundwater of the North China Plain using enhanced stacking ensemble learning

Wengeng Cao,Zhuo Zhang,Yu Fu,Lihua Zhao,Yu Ren,Tian Nan,Huaming Guo
DOI: https://doi.org/10.1016/j.watres.2024.121848
2024-08-01
Abstract:Chronic exposure to elevated geogenic arsenic (As) and fluoride (F-) concentrations in groundwater poses a significant global health risk. In regions around the world where regular groundwater quality assessments are limited, the presence of harmful levels of As and F- in shallow groundwater extracted from specific wells remains uncertain. This study utilized an enhanced stacking ensemble learning model to predict the distributions of As and F- in shallow groundwater based on 4,393 available datasets of observed concentrations and forty relevant environmental factors. The enhanced model was obtained by fusing well-suited Extreme Gradient Boosting, Random Forest, and Support Vector Machine as the base learners and a structurally simple Linear Discriminant Analysis as the meta-learner. The model precisely captured the patchy distributions of groundwater As and F- with an AUC value of 0.836 and 0.853, respectively. The findings revealed that 9.0% of the study area was characterized by a high As risk in shallow groundwater, while 21.2% was at high F- risk identified as having a high risk of fluoride contamination. About 0.2% of the study area shows elevated levels of both of them. The affected populations are estimated at approximately 7.61 million, 34.1 million, and 0.2 million, respectively. Furthermore, sedimentary environment exerted the greatest influence on distribution of groundwater As, with human activities and climate following closely behind at 29.5%, 28.1%, and 21.9%, respectively. Likewise, sedimentary environment was the primary factor affecting groundwater F- distribution, followed by hydrogeology and soil physicochemical properties, contributing 27.8%, 24.0%, and 23.3%, respectively. This study contributed to the identification of health risks associated with shallow groundwater As and F-, and provided insights into evaluating health risks in regions with limited samples.
What problem does this paper attempt to address?