Assessment of Machine Learning Algorithms in Short-term Forecasting of PM10 and PM2.5 Concentrations in Selected Polish Agglomerations

Michał Marosz,Joanna Jędruszkiewicz,B. Czernecki
DOI: https://doi.org/10.4209/AAQR.200586
2021-03-25
Aerosol and Air Quality Research
Abstract:ABSTRACT Air pollution continues to have a significant impact on Europeans living in urban areas. Each year, elevated concentration episodes of PMx are responsible for a large number of premature deaths (mostly due to heart diseases and strokes). Poland is one of the most polluted countries in Europe according to annual EEA reports. A high winter PMx concentration is mostly the result of high emission and unfavourable weather conditions combined with environmental features. It is crucial to create the most accurate PMx concentration forecast so as to be able to alert society on time along with the needed municipal mitigation schemes. The research is aimed at assessing the possibility of short-term forecast of PMx concentrations by means of machine learning tools with the subsequent identification of primary meteorological covariates. The data comprises 10 years of winter hourly PM10 and PM2.5 concentrations in 4 large Polish agglomerations: Poznan, Krakow, Łodź, and Gdansk. The research covered a total of 11 urban air quality monitoring stations, including background, traffic, and industrial types. The selected cities cover areas of high population density and quite a diverse environment stretching from the Baltic Sea coast (Tricity), through lowlands (Łodź, Poznan) to highlands (Krakow). We applied four ML models: stepwise regression (AIC-based), two tree-based algorithms (Random Forest and XGBoost), and a neural network model. The analysis and the application of the cross-validation scheme provided a clear assessment of the optimal algorithm. The presented study confirms the high applicability of ML tools for short-term air quality prediction with the perfect prog approach. Among the used algorithms, there is a clear ranking, with the worst results achieved by linear methods and gradual enhancement through Neural Networks, Random Forest, and finally, XGBoost algorithm providing the best results. This is apparent in the regression approach and binary forecasts for threshold exceedance.
Environmental Science,Computer Science
What problem does this paper attempt to address?