Machine learning and deep learning‐driven methods for predicting ambient particulate matters levels: A case study

Amin Wu,Fouzi Harrou,Abdelkader Dairi,Ying Sun
DOI: https://doi.org/10.1002/cpe.7035
2022-04-26
Concurrency and Computation: Practice and Experience
Abstract:SummaryDust, or particulate matter (PM2.5), is among the most harmful pollutants negatively affecting human health. Predicting indoor PM2.5 concentrations is essential to achieve acceptable indoor air quality. This study aims to investigate data‐driven models to accurately predict PM 2.5 pollution. Notably, a comparative study has been conducted between twenty‐one machine learning and deep learning models to predict PM2.5 levels. Specifically, we investigate the performance of machine learning and deep learning models to predict ambient PM2.5 concentrations based on other ambient pollutants, including SO2<span class="mjpage mjpage__block"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="1.676ex" style="vertical-align: -0.671ex;" viewBox="0 -432.6 453.9 721.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="-213"></use></g></svg></span>, NO2<span class="mjpage mjpage__block"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="1.676ex" style="vertical-align: -0.671ex;" viewBox="0 -432.6 453.9 721.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="-213"></use></g></svg></span>, O3<span class="mjpage mjpage__block"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="1.676ex" style="vertical-align: -0.671ex;" viewBox="0 -432.6 453.9 721.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-33" x="0" y="-213"></use></g></svg></span>, CO, and PM10. Here, we applied Bayesian optimization to optimally tune hyperparameters of the Gaussian process regression with different kernels and ensemble learning models (i.e., boosted trees and bagged trees) and investigated their prediction performance. Furthermore, to further enhance the forecasting performance of the investigated models, dynamic information has been incorporated by introducing lagged measurements in the construction of the considered models. Results show a significant improvement in the prediction performance when considering dynamic information from past data. Moreover, three methods, namely, random forest (RF), decision tree, and extreme gradient boosting, are applied to assess variables contribution and revealed that lagged PM2.5 data contribute significantly to the prediction performance and enables the construction of parsimonious models. Hourly concentration levels of ambient air pollution from the air quality monitoring network located in Seoul are employed to verify the prediction effectiveness of the studied models. Six measurements of effectiveness are used for assessing the prediction quality. Results showed that deep learning models are more efficient than the other investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost). Also, the results showed that the bidirectional long short term memory (BiLSTM) and bidirectional gated recurrent units (BiGRU) networks produce higher performance than the investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost) and deep learning models (i.e., LSTM, GRU, and convolutional neural network).<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="1" id="MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path></defs></svg>
What problem does this paper attempt to address?