Abstract:Ambient exposure to fine particulate matter (PM2.5) is known to harm public health in China. Satellite remote sensing measurements of aerosol optical depth (AOD) were statistically associated with in-situ observations after 2013 to predict PM2.5 concentrations nationwide, while the lack of surface monitoring data before 2013 have created difficulties in historical PM2.5 exposure estimates. Hindcast approaches using statistical models or chemical transport models (CTMs) were developed to overcome this limitation, while those approaches still suffer from incomplete daily coverage due to missing AOD data or limited accuracy due to uncertainties of CTMs. Here we developed a new machine learning (ML) model with high-dimensional expansion (HD-expansion) of numerous predictors (including AOD and other satellite covariates, meteorological variables and CTM simulations). Through comprehensive characterization of the nonlinear effects of, and interactions among different predictors, the HD-expansion parameterized the association between PM2.5 and AOD as a nonlinear function of space and time covariates (e.g., planetary boundary layer height and relative humidity). In this way, the PM2.5-AOD association can vary spatiotemporally. We trained the model with data from 2013 to 2016 and evaluated its performance using annually-iterated cross-validation, which iteratively held out the in-situ observations for a whole calendar year (as testing data) to examine the predictions from a model trained by the rest of the observations. Our estimates were found to be in good agreement with in-situ observations, with correlation coefficients (R2) of 0.61, 0.68, and 0.75 for daily, monthly and annual averages, respectively. To interpolate the missing predictions due to incomplete AOD data, we incorporated a generalized additive model into the ML model. The two-stage estimates of PM2.5 sacrificed the prediction accuracy on a daily timescale (R2 = 0.55), but achieved complete spatiotemporal coverage and improved the accuracy of monthly (R2 = 0.71) and annual (R2 = 0.77) averages. The model was then used to predict daily PM2.5 concentrations during 2000-2016 across China and estimate long-term trends in PM2.5 for the period. We found that population-weighted concentrations of PM2.5 significantly increased, by 2.10 (95% confidence interval (CI): 1.74, 2.46) μg/m3/year during 2000-2007, and rapidly decreased by 4.51 (3.12, 5.90) μg/m3/year during 2013-2016. In this study, we produced AOD-based estimates of historical PM2.5 with complete spatiotemporal coverage, which were evidenced as accurate, particularly in middle and long term. The products could support large-scale epidemiological studies and risk assessments of ambient PM2.5 in China and can be accessed via the website (http://www.meicmodel.org/dataset-phd.html).

Predicting Personal Exposure to PM2.5 Using Different Determinants and Machine Learning Algorithms in Two Megacities, China

Estimating Ground-Level PM 10 in a Chinese City by Combining Satellite Data, Meteorological Information and a Land Use Regression Model

Application of the XGBoost Machine Learning Method in PM2.5 Prediction: A Case Study of Shanghai

National Scale Spatiotemporal Land-Use Regression Model for PM2.5, PM10 and NO2 Concentration in China

Time series-based PM2.5 concentration prediction in Jing-Jin-Ji area using machine learning algorithm models

Predicting PM2.5 levels and exceedance days using machine learning methods

Prediction of PM2.5 Concentration Using Spatiotemporal Data with Machine Learning Models

Estimation of Personal PM2.5 and BC Exposure by a Modeling Approach - Results of a Panel Study in Shanghai, China.

Machine learning and deep learning modeling and simulation for predicting PM2.5 concentrations

Super-learning and Ensemble Weighted Averaging Models to Predict Hyperlocal Long-Term Exposure to Fine Particulate Matter Components in the United States

Enhancing indoor PM2.5 predictions based on land use and indoor environmental factors by applying machine learning and spatial modeling approaches

Application of machine learning algorithms to improve numerical simulation prediction of PM2.5 and chemical components

A machine learning model to estimate ambient PM2.5 concentrations in industrialized highveld region of South Africa

Spatiotemporal dynamics and exposure analysis of daily PM2.5 using a remote sensing-based machine learning model and multi-time meteorological parameters

Predicting intraurban PM2.5 concentrations using enhanced machine learning approaches and incorporating human activity patterns

Short-Term PM2.5 Concentration Changes Prediction: A Comparison of Meteorological and Historical Data

Spatiotemporal Continuous Estimates of PM2.5 Concentrations in China, 2000-2016: A Machine Learning Method with Inputs from Satellites, Chemical Transport Model, and Ground Observations.

Modeling spatial variation of gaseous air pollutants and particulate matters in a Metropolitan area using mobile monitoring data

Reliability Assessment of PM2.5 Concentration Monitoring Data: A Case Study of China

Construction and evaluation of hourly average indoor PM2.5 concentration prediction models based on multiple types of places

Prediction of size-fractionated airborne particle-bound metals using MLR, BP-ANN and SVM analyses