Filling the gaps of in-situ hourly PM<sub>2.5</sub> concentration data with the aid of empirical orthogonal function constrained by diurnal cycles

Kaixu Bai,Ke Li,Jianping Guo,Yuanjian Yang,Ni-Bin Chang
DOI: https://doi.org/10.5194/amt-2019-317
2019-01-01
Abstract:Abstract. Data gaps are frequently observed in the hourly PM2.5 mass concentration records measured from the China national air quality monitoring network. In this study, we proposed a novel gap filling method called the diurnal cycle constrained empirical orthogonal function (DCCEOF) to fill in data gaps present in hourly PM2.5 concentration records. This method mainly calibrates the diurnal cycle of PM2.5 that is reconstructed from discrete PM2.5 neighborhood fields in space and time to the level of valid PM2.5 data values observed at adjacent times. Prior to gap filling, possible impacts of varied number of data gaps in the time series of hourly PM2.5 concentration on PM2.5 daily averages were examined via sensitivity experiments. The results showed that PM2.5 data suffered from the gaps on about 40% of days, indicating a high frequency of missing data in the hourly PM2.5 records. These gaps could introduce significant bias to daily-averaged PM2.5. Particularly, given the same number of gaps, larger biases would be introduced to daily-averaged PM2.5 during clean days than polluted days. The cross-validation results indicate that the predicted missing values from the DCCEOF method with the consideration of the local diurnal phases of PM2.5 are more accurate and reasonable than those from the conventional spline interpolation approach, especially for the reconstruction of daily peaks and/or minima that cannot be restored by the latter method. To fill the gaps in the hourly PM2.5 records across China during 2014 to 2019, as a practical application, the DCCEOF method can be able to reduce the averaged frequency of missingness from 42.6 % to 5.7 %. In general, the present work implies that the DCCEOF method is realistic and robust to be able to handle the missingness issues in time series of geophysical parameters with significant diurnal variability and can be expectably applied in other data sets with similar barriers because of its self-consistent capability.
What problem does this paper attempt to address?