Abstract:SummaryDust, or particulate matter (PM2.5), is among the most harmful pollutants negatively affecting human health. Predicting indoor PM2.5 concentrations is essential to achieve acceptable indoor air quality. This study aims to investigate data‐driven models to accurately predict PM 2.5 pollution. Notably, a comparative study has been conducted between twenty‐one machine learning and deep learning models to predict PM2.5 levels. Specifically, we investigate the performance of machine learning and deep learning models to predict ambient PM2.5 concentrations based on other ambient pollutants, including SO2<span class="mjpage mjpage__block"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="1.676ex" style="vertical-align: -0.671ex;" viewBox="0 -432.6 453.9 721.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="-213"></use></g></svg></span>, NO2<span class="mjpage mjpage__block"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="1.676ex" style="vertical-align: -0.671ex;" viewBox="0 -432.6 453.9 721.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="-213"></use></g></svg></span>, O3<span class="mjpage mjpage__block"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="1.676ex" style="vertical-align: -0.671ex;" viewBox="0 -432.6 453.9 721.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-33" x="0" y="-213"></use></g></svg></span>, CO, and PM10. Here, we applied Bayesian optimization to optimally tune hyperparameters of the Gaussian process regression with different kernels and ensemble learning models (i.e., boosted trees and bagged trees) and investigated their prediction performance. Furthermore, to further enhance the forecasting performance of the investigated models, dynamic information has been incorporated by introducing lagged measurements in the construction of the considered models. Results show a significant improvement in the prediction performance when considering dynamic information from past data. Moreover, three methods, namely, random forest (RF), decision tree, and extreme gradient boosting, are applied to assess variables contribution and revealed that lagged PM2.5 data contribute significantly to the prediction performance and enables the construction of parsimonious models. Hourly concentration levels of ambient air pollution from the air quality monitoring network located in Seoul are employed to verify the prediction effectiveness of the studied models. Six measurements of effectiveness are used for assessing the prediction quality. Results showed that deep learning models are more efficient than the other investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost). Also, the results showed that the bidirectional long short term memory (BiLSTM) and bidirectional gated recurrent units (BiGRU) networks produce higher performance than the investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost) and deep learning models (i.e., LSTM, GRU, and convolutional neural network).<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="1" id="MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path></defs></svg>

Machine Learning Based PM 2.5 and 10 Concentration Modeling for Delhi City

Ambient PM2.5 Estimates and Variations During COVID-19 Pandemic in the Yangtze River Delta Using Machine Learning and Big Data

Data-driven predictive modeling of PM2.5 concentrations using machine learning and deep learning techniques: a case study of Delhi, India

Modelling PM2.5 for Data-Scarce Zone of Northwestern India using Multi Linear Regression and Random Forest Approaches

Integrating machine learning techniques for Air Quality Index forecasting and insights from pollutant-meteorological dynamics in sustainable urban environments

Utilizing LSTM models to predict PM 2.5 levels during critical episodes in Delhi, the world's most polluted capital city

Low-cost nature-inspired deep learning system for PM2.5 forecast over Delhi, India

Extracting Regional and Temporal Features to Improve Machine Learning for Hourly Air Pollutants in Urban India

Evaluation of Time Series Forecasting Models for Estimation of PM2.5 Levels in Air

Evaluation of Non-stationary Spatial Relationship between Meteorological-Environmental Parameters and PM 2.5

Air pollution prediction with machine learning: a case study of Indian cities

Performance analysis of machine learning models for AQI prediction in Gorakhpur City: a critical study

A novel seasonal index–based machine learning approach for air pollution forecasting

AI-based prediction of the improvement in air quality induced by emergency measures

Mapping the Spatiotemporal Variability of Particulate Matter Pollution in Delhi: Insights from Land Use Regression Modelling

PM 2.5 concentration forecasting: Development of integrated multivariate variational mode decomposition with kernel Ridge regression and weighted mean of vectors optimization

PM2.5 Concentration Forecasting: Development of Integrated Multivariate Variational Mode Decomposition with Kernel Ridge Regression and Weighted Mean of Vectors Optimization

Deep Insight into Urban Air Quality Utilizing Neural Networks for Enhanced Prediction in Korean Cities Where Factories and Ecosystem Environments Coexists

Machine learning and deep learning‐driven methods for predicting ambient particulate matters levels: A case study

Nationwide estimation of daily ambient PM2.5 from 2008 to 2020 at 1 km2 in India using an ensemble approach

Transforming air pollution management in India with AI and machine learning technologies