Machine Learning Approach for Spatially and Temporally Resolved PM2.5 Exposures in the Continental United States

Qian Di,Petros Koutrakis,Christine Choirat,Francesca Dominici,Joel D. Schwartz
DOI: https://doi.org/10.1289/isee.2017.2017-389
2018-01-01
Abstract:Background/Aim: Deep learning is a class of machine learning algorithms. Convolutional neural network, a deep learning algorithm, has brought about breakthroughs in processing image, speech, audio and text. Inside it, a convolutional layer takes information from nearby pixels to create high-level abstraction to improve model performance. This abstraction capacity is what traditional air pollution modelling lacks: traditional air pollution modelling uses variable values at monitor stations (in-situ information) to establish relationship and make prediction. However, neighbouring information (e.g., nearby traffic volume, neighbouring land-use type) also impacts local PM2.5 measurements but is often ignored in previous modelling. Convolutional neural network can potentially take neighbouring information into account and improve model performance. Methods: We used a convolutional neural network with multiple predictors, including aerosol optical depth, chemical transport model outputs, land-use variables, meteorological variables, surface reflectance, absorbing aerosol index as predictors to model ground-level PM2.5 from monitoring stations. For each variable, we extract its values at monitor stations as well as neighbouring locations as input information of the neural network. We incorporated multiple convolutional layers, pooling layers and full connection layers inside the neural network to model complex relationship between variables. Results: Model performance on validation data set indicated a good performance with daily R2 = 0.84 and MSE = 2.94 µg/m3. Model performance also exhibited regional variations with higher model performance in the Eastern and Central U.S. than the Western U.S. The Model still performed well at low PM2.5 levels (<12 µg/m3). Prediction results indicated higher PM2.5 concentrations in the Eastern and Central U.S. Summer time had higher PM2.5 levels than other seasons. Conclusions: This study explored a deep learning approach to model air pollutions with high accuracy, which facilitates follow-up epidemiological studies. This study suggests wider application of deep learning techniques in the field of environmental epidemiology.
What problem does this paper attempt to address?