Using wavelet transform and dynamic time warping to identify the limitations of the CNN model as an air quality forecasting system

Ebrahim Eslami,Yunsoo Choi,Yannic Lops,Alqamah Sayeed,Ahmed Khan Salman
DOI: https://doi.org/10.5194/gmd-2019-346
2020-03-09
Abstract:Abstract. As the deep learning algorithm has become a popular data analytic technique, atmospheric scientists should have a balanced perception of its strengths and limitations so that they can provide a powerful analysis of complex data with well-established procedures. Despite the enormous success of the algorithm in numerous applications, certain issues related to its applications in air quality forecasting (AQF) require further analysis and discussion. This study addresses significant limitations of an advanced deep learning algorithm, the convolutional neural network (CNN), in two common applications: (i) a real-time AQF model, and (ii) a post-processing tool in a dynamical AQF model, the Community Multi-scale Air Quality Model (CMAQ). In both cases, the CNN model shows promising accuracy for ozone prediction 24 hours in advance in both the United States and South Korea (with an overall index of agreement exceeding 0.8). For the first case, we use the wavelet transform to determine the reasons behind the poor performance of CNN during the nighttime, cold months, and high ozone episodes. We find that when fine wavelet modes (hourly and daily) are relatively weak or when coarse wavelet modes (weekly) are strong, the CNN model produces less accurate forecasts. For the second case, we use the dynamic time warping (DTW) distance analysis to compare post-processed results with their CMAQ counterparts (as a base model). For CMAQ results that show a consistent DTW distance from the observation, the post-processing approach properly addresses the modeling bias with predicted IOAs exceeding 0.85. When the DTW distance of CMAQ-vs-observation is irregular, the post-processing approach is unlikely to perform satisfactorily. Awareness of the limitations in CNN models will enable scientists to develop more accurate regional or local air quality forecasting systems by identifying the affecting factors in high concentration episodes.
What problem does this paper attempt to address?