Assessing NO2 Exposures with High Spatiotemporal Resolution Across the Contiguous United States Using Ensemble Model
Di Q,Amini H,Kloog I,Silvern R,Kelly J,Sabath M,Choirat C,Koutrakis P,Lyapustin A,Wang Y,Schwartz J
DOI: https://doi.org/10.1097/01.ee9.0000609924.28602.10
2019-01-01
Environmental Epidemiology
Abstract:OPS 03: Machine learning in environmental epidemiology, Room 315, Floor 3, August 26, 2019, 4:30 PM - 5:30 PM Background: NO2 is an air pollutant that leads to multiple adverse health outcomes. Various modeling approaches have been proposed to estimate NO2, using statistical regressions, machine learning algorithms, hybrid models and other approaches, with predictor variables ranging from land-use terms, satellite-derived column NO2 concentration, and meteorological variables. Those complementary fitting methods and predictor variables have the potential to improve model performance. However, few studies have been proposed to integrate multiple fitting methods to estimate NO2. Methods: We propose an ensemble model to integrate multiple machine learning algorithms, including neural network, random forest, and gradient boosting, with a variety of predictor variables as input variables. This NO2 model covers the entire contiguous United States from 2000 to 2016. After model training, we predicted daily NO2 levels at 1 km × 1 km grid cells, as well as associated monthly uncertainty level. We also downscaled the 1-km-level prediction to 100-meter-level. Results: After cross-validation, this ensemble-based model produced good R2, with mean R2 0.788, and mean spatial R2 0.844. The relationship between daily monitored NO2 and predicted NO2 is almost linear. The distribution of NO2 exhibits clear spatial clustering, with high concentration clustering around urban areas, especially major cities, and along highways. Temporally, NO2 level underwent a profound decline over the study area, with annual level in 2016 about 50% of the 2000 level. Conclusion: This NO2 estimation has very high spatiotemporal resolution (daily and 1 km × 1 km), covers a large spatial area (contiguous United States), and provides good exposure assessment to epidemiologists to analyze the long-term and short-term health effect of NO2. We also conclude that the most appropriate predictor variables and fitting algorithm are context-based. It is time to consider how to integrate different predictor variables and fitting algorithms together and achieve an optimized modeling for air pollution estimation.