A data-driven approach to identifying PFAS water sampling priorities in Colorado, United States

Kelsey E Barton,Peter J Anthamatten,John L Adgate,Lisa M McKenzie,Anne P Starling,Kevin Berg,Robert C Murphy,Kristy Richardson
DOI: https://doi.org/10.1038/s41370-024-00705-7
2024-08-01
Abstract:Background: Per and polyfluoroalkyl substances (PFAS), a class of environmentally and biologically persistent chemicals, have been used across many industries since the middle of the 20th century. Some PFAS have been linked to adverse health effects. Objective: Our objective was to incorporate known and potential PFAS sources, physical characteristics of the environment, and existing PFAS water sampling results into a PFAS risk prediction map that may be used to develop a PFAS water sampling prioritization plan for the Colorado Department of Public Health and Environment (CDPHE). Methods: We used random forest classification to develop a predictive surface of potential groundwater contamination from two PFAS, perfluorooctane sulfonate (PFOS) and perfluorooctanoate (PFOA). The model predicted PFAS risk at locations without sampling data into one of three risk categories after being "trained" with existing PFAS water sampling data. We used prediction results, variable importance ranking, and population characteristics to develop recommendations for sampling prioritization. Results: Sensitivity and precision ranged from 58% to 90% in the final models, depending on the risk category. The model and prioritization approach identified private wells in specific census blocks, as well as schools, mobile home parks, and public water systems that rely on groundwater as priority sampling locations. We also identified data gaps including areas of the state with limited sampling and potential source types that need further investigation. Impact statement: This work uses random forest classification to predict the risk of groundwater contamination from two per- and polyfluoroalkyl substances (PFAS) across the state of Colorado, United States. We developed the prediction model using data on known and potential PFAS sources and physical characteristics of the environment, and "trained" the model using existing PFAS water sampling results. This data-driven approach identifies opportunities for PFAS water sampling prioritization as well as information gaps that, if filled, could improve model predictions. This work provides decision-makers information to effectively use limited resources towards protection of populations most susceptible to the impacts of PFAS exposure.
What problem does this paper attempt to address?