Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Introduction</h3><p>Estimating PM<sub>2.5</sub> concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Methods</h3><p>Using ensemble-based deep learning with big data fused from multiple sources we developed a PM<sub>2.5</sub> prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008–2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM<sub>2.5</sub> emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM<sub>2.5</sub> was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Results</h3><p>Ensemble deep learning to predict PM<sub>2.5</sub> achieved an overall mean training RMSE of 1.54 μg/m<sup>3</sup> (R<sup>2</sup>: 0.94) and test RMSE of 2.29 μg/m<sup>3</sup> (R<sup>2</sup>: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM<sub>2.5</sub> sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m<sup>3</sup>). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM<sub>2.5</sub>. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusion</h3><p>Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM<sub>2.5</sub> has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.</p>

A novel ensemble-based statistical approach to estimate daily wildfire-specific PM 2.5 in California (2006-2020)

Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke

A statistical model for predicting PM2.5 for the western United States

Satellite-Based Daily PM 2.5 Estimates During Fire Seasons in Colorado

Daily Fine Resolution Estimates of the Influence of Wildfires on Fine Particulate Matter in California, 2011–2020

An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution

Application of geostationary satellite and high-resolution meteorology data in estimating hourly PM2.5 levels during the Camp Fire episode in California

High Spatiotemporal Resolution PM2.5 Concentration Estimation with Machine Learning Algorithm: A Case Study for Wildfire in California

A model for rapid PM2.5 exposure estimates in wildfire conditions using routinely available data: rapidfire v0.1.3

Evaluating Chemical Transport and Machine Learning Models for Wildfire Smoke PM2.5: Implications for Assessment of Health Impacts

A comparison of statistical and machine learning methods for creating national daily maps of ambient PM$_{2.5}$ concentration

Super-learning and Ensemble Weighted Averaging Models to Predict Hyperlocal Long-Term Exposure to Fine Particulate Matter Components in the United States

A spatial causal analysis of wildland fire-contributed PM2.5 using numerical model output

Multi-Agency Ensemble Forecast of Wildfire Air Quality in the United States: Toward Community Consensus of Early Warning

Ensemble PM2.5 Forecasting During the 2018 Camp Fire Event Using the HYSPLIT Transport and Dispersion Model

Nationwide estimation of daily ambient PM2.5 from 2008 to 2020 at 1 km2 in India using an ensemble approach

A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making

Spatial Heterogeneity of the Respiratory Health Impacts of Wildfire Smoke PM2.5 in California

Combining Satellite Imagery and Numerical Model Simulation to Estimate Ambient Air Pollution: An Ensemble Averaging Approach

A Machine Learning Method to Estimate PM2.5 Concentrations Across China with Remote Sensing, Meteorological and Land Use Information.

Assessing the 2023 Canadian wildfire smoke impact in Northeastern US: Air quality, exposure and environmental justice