Predicting Personal Exposure to PM2.5 Using Different Determinants and Machine Learning Algorithms in Two Megacities, China

Na Li,Yunpu Li,Dongqun Xu,Zhe Liu,Ning Li,Ryan Chartier,Junrui Chang,Qin Wang,Chunyu Xu
DOI: https://doi.org/10.1155/2024/5589891
IF: 6.5539
2024-03-08
Indoor Air
Abstract:The primary aim of this study is to explore the utility of machine learning algorithms for predicting personal PM2.5 exposures of elderly participants and to evaluate the effect of individual variables on model performance. Personal PM2.5 was measured on five consecutive days across seasons in 66 retired adults in Beijing (BJ) and Nanjing (NJ), China. The potential predictors were extracted from routine monitoring data (ambient PM2.5 concentrations and meteorological factors), basic questionnaires (personal and household characteristics), and time-activity diary (TAD). Prediction models were developed based on either traditional multiple linear regression (MLR) or five advanced machine learning methods. Our results revealed that personal PM2.5 exposures were well predicted by both MLR and machine learning models with predictors extracted from routine monitoring data, which was indicated by the high nested cross-validation (CV) R 2 ranging from 0.76 to 0.88. The addition of predictors from either the questionnaire or TAD did not improve predictive accuracy for all algorithms. The ambient PM2.5 concentrations were the most important predictor. Overall, the random forest, support vector machine, and extreme gradient boosting algorithms outperformed the reference MLR method. Compared with the traditional MLR approach, the CV R 2 of the RF model increased up to 7% (from 0.82 ± 0.13 to 0.88 ± 0.10 ), while the RMSE reduced up to 18% (from 19.8 ± 5.4 to 16.3 ± 4.5 ) in BJ.
engineering, environmental,public, environmental & occupational health,construction & building technology
What problem does this paper attempt to address?