Accurate Recovery of Internet Traffic Data: A Sequential Tensor Completion Approach

Kun Xie,Lele Wang,Xin Wang,Gaogang Xie,Jigang Wen,Guangxing Zhang
DOI: https://doi.org/10.1109/tnet.2018.2797094
2018-01-01
IEEE/ACM Transactions on Networking
Abstract:The inference of traffic volume of the whole network from partial traffic measurements becomes increasingly critical for various network engineering tasks, such as capacity planning and anomaly detection. Previous studies indicate that the matrix completion is a possible solution for this problem. However, as a 2-D matrix cannot sufficiently capture the spatial-temporal features of traffic data, these approaches fail to work when the data missing ratio is high. To fully exploit hidden spatial-temporal structures of the traffic data, this paper models the traffic data as a 3-way traffic tensor and formulates the traffic data recovery problem as a low-rank tensor completion problem. However, the high computation complexity incurred by the conventional tensor completion algorithms prevents its practical application for the traffic data recovery. To reduce the computation cost, we propose a novel sequential tensor completion algorithm, which can efficiently exploit the tensor decomposition result based on the previous traffic data to derive the tensor decomposition upon arriving of new data. Furthermore, to better capture the changes of data correlation over time, we propose a dynamic sequential tensor completion algorithm. To the best of our knowledge, we are the first to propose sequential tensor completion algorithms to significantly speed up the traffic data recovery process. This facilitates the modeling of Internet traffic with the tensor to well exploit the hidden structures of traffic data for more accurate missing data inference. We have done extensive simulations with the real traffic trace as the input. The simulation results demonstrate that our algorithms can achieve significantly better performance compared with the literature tensor and matrix completion algorithms even when the data missing ratio is high.
What problem does this paper attempt to address?