Enhancing PM2.5 Data Imputation and Prediction in Air Quality Monitoring Networks Using a KNN-SINDy Hybrid Model

Yohan Choi,Boaz Choi,Jachin Choi
2024-09-18
Abstract:Air pollution, particularly particulate matter (PM2.5), poses significant risks to public health and the environment, necessitating accurate prediction and continuous monitoring for effective air quality management. However, air quality monitoring (AQM) data often suffer from missing records due to various technical difficulties. This study explores the application of Sparse Identification of Nonlinear Dynamics (SINDy) for imputing missing PM2.5 data by predicting, using training data from 2016, and comparing its performance with the established Soft Impute (SI) and K-Nearest Neighbors (KNN) methods.
Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of missing fine particulate matter (PM2.5) data in air quality management. Specifically, the study explores the use of combining the Sparse Identification of Nonlinear Dynamics (SINDy) framework with traditional imputation methods (such as Soft Impute and K-Nearest Neighbors, KNN) to improve the accuracy of PM2.5 concentration data imputation. The study utilized PM2.5 concentration data from five air quality monitoring stations in Seoul, South Korea, from 2016 to 2018, and evaluated the effectiveness of several imputation methods by introducing different proportions of missing data. The results indicate that the KNN-SINDy hybrid model performed the best in handling various proportions of missing data, especially in cases of high proportions of missing data, where its prediction accuracy significantly outperformed other methods. This suggests that combining SINDy with traditional imputation methods can enhance the reliability of air quality monitoring systems, thereby promoting more effective environmental decision-making.