A novel ensemble-based statistical approach to estimate daily wildfire-specific PM 2.5 in California (2006-2020)

Rosana Aguilera,Nana Luo,Rupa Basu,Jun Wu,Rachel Clemesha,Alexander Gershunov,Tarik Benmarhnia
DOI: https://doi.org/10.1016/j.envint.2022.107719
IF: 11.8
2022-12-25
Environment International
Abstract:Though fine particulate matter (PM 2.5 ) has decreased in the United States (U.S.) in the past two decades, the increasing frequency, duration, and severity of wildfires significantly (though episodically) impairs air quality in wildfire-prone regions and beyond. Increasing PM 2.5 concentrations derived from wildfire smoke and associated impacts on public health require dedicated epidemiological studies. Main sources of PM 2.5 data are provided by government-operated monitors sparsely located across U.S., leaving several regions and potentially vulnerable populations unmonitored. Current approaches to estimate PM 2.5 concentrations in unmonitored areas often rely on big data, such as satellite-derived aerosol properties and meteorological variables, apply computationally-intensive deterministic modeling, and do not distinguish wildfire-specific PM 2.5 from other sources of emissions such as traffic and industrial sources. Furthermore, modelling wildfire-specific PM 2.5 presents a challenge since measurements of the smoke contribution to PM 2.5 pollution are not available. Here, we aim to use statistical methods to isolate wildfire-specific PM 2.5 from other sources of emissions. Our study presents an ensemble model that optimally combines multiple machine learning algorithms (including gradient boosting machine, random forest and deep learning), and a large set of explanatory variables to, first, estimate daily PM 2.5 concentrations at the ZIP code level, a relevant spatiotemporal resolution for epidemiological studies. Subsequently, we propose a novel implementation of an imputation approach to estimate the wildfire-specific PM 2.5 concentrations that could be applied geographical regions in the US or worldwide. Our ensemble model achieved comparable results to previous machine learning studies for PM 2.5 prediction while avoiding processing larger, computationally intensive datasets. Our study is the first to apply a suite of statistical models using readily available datasets to provide daily wildfire-specific PM 2.5 at a fine spatial scale for a 15-year period, thus providing a relevant spatiotemporal resolution and timely contribution for epidemiological studies.
environmental sciences
What problem does this paper attempt to address?