Deep learning spatiotemporal air pollution data in China using data fusion

Xiaolu Zhou,Weitian Tong,Lixin Li
DOI: https://doi.org/10.1007/s12145-020-00470-9
2020-06-01
Earth Science Informatics
Abstract:An efficient and effective spatiotemporal prediction algorithm for PM2.5 (i.e. particulate matter with a diameter of less than 2.5 micrometers) is urgently needed to study the distribution of PM2.5 over a continuous spatiotemporal domain, which not only helps to make scientific decisions on the prevention and control of PM2.5 pollution but also promotes meaningful assessment of the quantitative relationship between adverse health effects and PM2.5 concentrations over time. Existing spatiotemporal interpolation algorithms are usually based on the assumption that interpolation models follow explicit and simple mathematical descriptions. Unfortunately, the real world does not really follow these perfect mathematical models. Combining data fusion techniques and a Long Short-Term Memory (LSTM) recurrent neural network (RNN), we present a novel spatiotemporal interpolation model, which is able to achieve high estimation accuracies over a long time period and a large area. By fusing the daily PM2.5 data, meteorological data, elevation data, and land-use data collected from China in 2016, four experiments were conducted in this study to evaluate the efficiency and effectiveness of the proposed approach. Results showed that applying LSTM RNN on the fused dataset can achieve consistent and high accuracy in different geographies.
geosciences, multidisciplinary,computer science, interdisciplinary applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop an efficient and effective spatio - temporal prediction algorithm for predicting the PM2.5 concentration distribution within China. Specifically, the research aims to construct a new spatio - temporal interpolation model by fusing data technologies (such as meteorological data, elevation data and land - use data) with Long - Short - Term Memory (LSTM) Recurrent Neural Network (RNN) to achieve high - precision estimation over long time periods and large areas. This not only helps scientific decision - making, prevention and control of PM2.5 pollution, but also promotes a meaningful evaluation of the quantitative relationship between adverse health effects and PM2.5 concentration. The paper emphasizes that the existing spatio - temporal interpolation algorithms are usually based on the assumption that the interpolation model follows a clear and simple mathematical description, but the real world does not always conform to these perfect mathematical models. Therefore, combining deep - learning methods, especially LSTM RNN, can automatically consider hidden factors and model PM2.5 data, thereby improving the accuracy of prediction. The effectiveness and efficiency of the proposed method were verified through four experiments on China's daily PM2.5 data, meteorological data, elevation data and land - use data in 2016. The results show that in different geographical regions, using LSTM RNN to process the fused data set can achieve consistent and high - precision prediction effects.