High spatio-temporal resolution predictions of PM 2.5 using low-cost sensor data
Armita Kar,Mohammed Ahmed,Andrew A. May,Huyen T.K. Le
DOI: https://doi.org/10.1016/j.atmosenv.2024.120486
IF: 5
2024-03-29
Atmospheric Environment
Abstract:We generated PM 2.5 predictions at a high spatio-temporal resolution in the Columbus, OH, Denver, CO, and Pittsburgh, PA metropolitan areas using low-cost PurpleAir sensor data. We used multiple modeling approaches, namely random forest (RF), random forest spatial interpolation (RFSI), space-time regression kriging (STRK), and random forest kriging (RFK). We trained separate models for each combination of hour, month, and city to predict PM 2.5 concentrations at 8 AM and 6 PM on any specific day at a spatial resolution of 100m. In most cases, models that account for the spatio-temporal relationships (e.g., STRK, RFK, RFSI) show better performance than non-spatio-temporal machine learning models (e.g., RF). On average, considering all models of all cities, RFSI (mean MAE = 1.75, R 2 = 0.67) and STRK (mean MAE = 1.74, R 2 = 0.63) models perform better than RFK models (mean MAE = 2.11, R 2 = 0.59), and STRK has clearest spatial patterns. We found that kriging models, especially STRK, are superior in capturing the spatio-temporal relationships and resemble the generic land use pattern of the city, while RFSI models are effective when dealing with very large datasets with missing cases. Our study demonstrates a multi-model approach that could inform low-cost sensor deployment to facilitate air quality modeling. Our high-resolution predictions could also facilitate studies on short-term, traffic-based exposure assessment.
environmental sciences,meteorology & atmospheric sciences