High-Resolution Mapping of Regional NMVOCs Using the Fast Space-Time Light Gradient Boosting Machine (Lightgbm)

Bingqing Lu,Chao Liu,Xue Meng,Zekun Zhang,Hartmut Herrmann,Xiang Li
DOI: https://doi.org/10.1029/2023jd039591
2023-01-01
Journal of Geophysical Research Atmospheres
Abstract:Accurate spatiotemporal estimation of non-methane volatile organic compounds (NMVOCs) plays a pivotal role in establishing sophisticated early warning systems and formulating strategies to combat air pollution. Despite these critical applications, robust estimation of high spatiotemporal resolution NMVOCs concentrations remains a challenge. In this study, we develop a space-time Light Gradient Boosting Machine (STLGB) model, which successfully renders hourly maps of NMVOCs concentrations across Shanghai from 2019 to 2022 by integrating spatiotemporal information. After extensive training, the STLGB model demonstrates remarkable estimation performance for NMVOCs, accounting for multiple spatiotemporal variables (R2 = 0.92, RMSE = 34.52 ppb). With the developed model, we provide first high-resolution (1 km) hourly NMVOCs concentration maps, uncovering previously overlooked spatiotemporal variations. Further, SHapley Additive exPlanation (SHAP) regression values reveal significant local interpretation capabilities of the STLGB model, emphasizing the strong influence of emissions on NMVOCs estimation, whilst acknowledging the important contribution of space and time term. Our study of the pandemic lockdown further showcases the model's adaptability to unique events influenced by policy changes. The superior performance of the STLGB model, with its minimal computational memory requirements and faster speed, makes it an ideal tool for air pollutant estimation, adaptable to any region with NMVOCs monitoring capabilities. Non-methane Volatile organic compounds (NMVOCs) are significant air pollutants that have severe effects on health and the environment. Despite the availability of VOCs monitoring stations that can provide data with good temporal resolution, the low spatial resolution of these stations remains a challenge. Accurate estimation of NMVOCs concentrations requires new methods to address these spatial resolution issues. In recent years, machine learning-based models have emerged as a promising alternative for air pollution estimations. However, research on high-resolution mapping of NMVOCs concentrations using machine learning models is limited. This study provides the first predicted spatial distribution of hourly NMVOCs map deduced from sparse observations using the machine learning models. We developed a space-time LightGBM model to estimate NMVOCs concentrations at 1 km spatial and hourly temporal resolution in Shanghai. Meanwhile, we use the shapely additive explanations to quantify and visualize the complex relationships between the input variables in the model. Moreover, we also offers an alternative solution to air pollution modeling with regard to unusual events (such as the COVID-19 lockdown). A space-time LightGBM (STLGB) model in machine learning is used to estimate NMVOCs reliablyHourly NMVOCs maps were produced at 1 km resolution by the STLGB modelThe STLGB model shows good performance with cross-validation R2 of 0.92
What problem does this paper attempt to address?