Data-driven approach for air pollutant concentrations forecasting: A window-based multi-output GBRT approach

Shikang Du,Siyu Chen,Shanling Cheng,Jiaqi He,Chuwei Liu,Lulu Lian,Chao Zhang,Dan Zhao,Nan Yin,Yawen Guan
DOI: https://doi.org/10.1016/j.atmosres.2024.107459
IF: 5.965
2024-05-23
Atmospheric Research
Abstract:Air pollutant concentrations forecasting is essential for detrimental public exposure prevention, air quality enhancement, and effective environmental policy shaping. Recently, air pollutant concentrations forecasting based on machine learning has been considerably advanced with the development of data-driven approaches. However, multi-source data constraints and architectural complexity in machine learning approaches limit the use of machine learning in actual scenarios. Herein we present a window-based multi-output gradient boosting regression tree (WM-GBRT) approach to predict air pollutant concentrations for the next 48 h at each monitoring station, considering air pollutant monitoring data, meteorological monitoring data, NECP-GFS data, and GEOS-CF data. The key innovation of our approach lies in the flexible input window configuration and the implementation of an multi-step independent strategy. The experiments were conducted in China and compared to existing data-driven baselines, we show that our approach achieved the superior performance in forecasting concentrations of six air pollutants, with a considerable improvement of 6.94 % ( PM2.5 ), 12.3 % ( PM10 ), 3.84 % ( O3 ), 30.6 % ( SO2 ), 9.84 % ( NO2 ), and 93.9 % ( CO ) when compared to the second-best baselines. This simple yet efficient approach can be easily applied, providing a reliable technical method for predicting air pollutant concentrations.
meteorology & atmospheric sciences
What problem does this paper attempt to address?