Abstract:Urban areas contribute substantially to human exposure to ambient air pollution. Numerous statistical prediction models have been used to estimate ambient concentrations of fine particulate matter (PM2.5) and other pollutants in urban environments, with some incorporating machine learning (ML) algorithms to improve predictive power. However, many ML approaches for predicting ambient pollutant concentrations to date have used principal component analysis (PCA) with traditional regression algorithms to explore linear correlations between variables and to reduce the dimensionality of the data. Moreover, while most urban air quality prediction models have traditionally incorporated explanatory variables such as meteorological, land use, transportation/mobility, and/or co-pollutant factors, recent research has shown that local emissions from building infrastructure may also be useful factors to consider in estimating urban pollutant concentrations. Here we propose an enhanced ML approach for predicting urban ambient PM2.5 concentrations that hybridizes cascade and PCA methods to reduce the dimensionality of the data-space and explore nonlinear effects between variables. We test the approach using different durations of time series air quality datasets of hourly PM2.5 concentrations from three air quality monitoring sites in different urban neighborhoods in Chicago, IL to explore the influence of dynamic human-related factors, including mobility (i.e., traffic) and building occupancy patterns, on model performance. We test 9 state-of-the-art ML algorithms to find the most effective algorithm for modeling intraurban PM2.5 variations and we explore the relative importance of all sets of factors on intraurban air quality model performance. Results demonstrate that Gaussian-kernel support vector regression (SVR) was the most effective ML algorithm tested, improving accuracy by 118% compared to a traditional multiple linear regression (MLR) approach. Incorporating the enhanced approach with SVR algorithm increased model performance up to 18.4% for yearlong and 98.7% for month-long hourly datasets, respectively. Incorporating assumptions for human occupancy patterns in dominant building typologies resulted in improvements in model performance by between 4% and 37%. Combined, these innovations can be used to improve the performance and accuracy of urban air quality prediction models compared to conventional approaches.

Variable importance measure for spatial machine learning models with application to air pollution exposure prediction

National Scale Spatiotemporal Land-Use Regression Model for PM2.5, PM10 and NO2 Concentration in China

Developing high-resolution PM2.5 exposure models by integrating low-cost sensors, automated machine learning, and big human mobility data

Spatial and spatiotemporal modelling of intra-urban ultrafine particles: A comparison of linear, nonlinear, regularized, and machine learning methods

Automatic Region-wise Spatially Varying Coefficient Regression Model: an Application to National Cardiovascular Disease Mortality and Air Pollution Association Study

A comparison of statistical and machine learning methods for creating national daily maps of ambient PM$_{2.5}$ concentration

Unveiling the Transparency of Prediction Models for Spatial PM2.5 over Singapore: Comparison of Different Machine Learning Approaches with eXplainable Artificial Intelligence

A comparison of statistical and machine learning models for spatio-temporal prediction of ambient air pollutant concentrations in Scotland

Modeling spatial variation of gaseous air pollutants and particulate matters in a Metropolitan area using mobile monitoring data

Predicting intraurban PM2.5 concentrations using enhanced machine learning approaches and incorporating human activity patterns

A spatial interference approach to account for mobility in air pollution studies with multivariate continuous treatments

Air pollution models in epidemiologic studies with geostatistics and machine learning

Super-learning and Ensemble Weighted Averaging Models to Predict Hyperlocal Long-Term Exposure to Fine Particulate Matter Components in the United States

Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data

High Spatial Resolution Land-Use Regression Model for Urban Ultrafine Particle Exposure Assessment in Shanghai, China

Quantification of multifactorial effects on particle distributions at urban neighborhood scale using machine learning and unmanned aerial vehicle measurement

A High Resolution Spatiotemporal Fine Particulate Matter Exposure Assessment Model for the Contiguous United States

Quantifying the contribution of environmental variables to cyclists' exposure to PM2.5 using machine learning techniques

Controlling for unmeasured confounding and spatial misalignment in long-term air pollution and health studies

A review of machine learning for modeling air quality: Overlooked but important issues

A comparison of statistical and machine-learning approaches for spatiotemporal modeling of nitrogen dioxide across Switzerland