A Spatiotemporal Approach for Traffic Data Imputation with Complicated Missing Patterns

Huiping Li,Meng Li,Xi Lin,Fang He,Yinhai Wang
DOI: https://doi.org/10.1016/j.trc.2020.102730
IF: 9.022
2020-01-01
Transportation Research Part C Emerging Technologies
Abstract:With the advent of intelligent transportation systems (ITS), spatiotemporal traffic data has gained growing importance in real-time monitoring, prediction, and control of traffic. However, in practical implementations, data collection devices are often faced with malfunctions caused by various unpredictable disruptions, thereby resulting in the so-called "missing value problems." In realistic cases, the disruptions to the data collection devices are often associated with some key events (e.g., power cut and natural disasters), in addition, along with other disruptions the missing value problem could be in a complicated manner with both randomly and completely missing patterns. To perform the imputation task with such complicated missing patterns, we propose a hybrid spatiotemporal method which utilizes the time series properties by "prophet" model and captures the spatial residuals information by iterative random forest model. The spatiotemporal method first applies the temporal part to fill the missing value and then adopts the spatial part to acquire the residual component of the missing values. The results of the two components are integrated into the final imputations. Based on the PeMS freeway dataset (PeMS, 2019) and an urban road dataset under extensive artificially designed scenarios like randomly, clustered non-completely and completely missing patterns, we test our proposed approach with some existing techniques such as K-Nearest Neighbor (KNN), Seasonal-Trend decomposition using Loess (STL), Bayesian tensor decomposition, Denoising AutoEncoder (DAE). The test results indicate that the hybrid method achieves the best imputation quality for most missing patterns, particularly for those with completely or hybrid missing patterns. Furthermore, the hybrid model still performs well under extreme missing rates as high as 0.9, which validates the robustness of the model in extreme situations.
What problem does this paper attempt to address?