Estimating hourly PM2.5 concentrations at the neighborhood scale using a low-cost air sensor network: A Los Angeles case study

Yougeng Lu,Genevieve Giuliano,Rima Habre
DOI: https://doi.org/10.1016/j.envres.2020.110653
Abstract:Predicting PM2.5 concentrations at a fine spatial and temporal resolution (i.e., neighborhood, hourly) is challenging. Recent growth in low cost sensor networks is providing increased spatial coverage of air quality data that can be used to supplement data provided by monitors of regulatory agencies. We developed an hourly, 500 × 500 m gridded PM2.5 model that integrates PurpleAir low-cost air sensor network data for Los Angeles County. We developed a quality control scheme for PurpleAir data. We included spatially and temporally varying predictors in a random forest model with random oversampling of high concentrations to predict PM2.5. The model achieved high prediction accuracy (10-fold cross-validation (CV) R2 = 0.93, root mean squared error (RMSE) = 3.23 μg/m3; spatial CV R2 = 0.88, spatial RMSE = 4.33 μg/m3; temporal CV R2 = 0.90, temporal RMSE = 3.85 μg/m3). Our model was able to predict spatial and diurnal patterns in PM2.5 on typical weekdays and weekends, as well as non-typical days, such as holidays and wildfire days. The model allows for far more precise estimates of PM2.5 than existing methods based on few sensors. Taking advantage of low-cost PM2.5 sensors, our hourly random forest model predictions can be combined with time-activity diaries in future studies, enabling geographically and temporally fine exposure estimation for specific population groups in studies of acute air pollution health effects and studies of environmental justice issues.
What problem does this paper attempt to address?