Tracking Matrix Approximation over Distributed Sliding Windows.

Haida Zhang,Zengfeng Huang,Zhewei Wei,Wenjie Zhang,Xuemin Lin
DOI: https://doi.org/10.1109/icde.2017.133
2017-01-01
Abstract:In many modern applications, input data is represented as matrices and often arrives continuously. The ability to summarize and approximate data matrices in streaming fashion has become a common requirement in many emerging environments. In these applications, input data is usually generated at multiple distributed sites and simply centralizing all data is often infeasible. Therefore, novel algorithmic techniques are required. Furthermore, in most of these applications, queries must be answered solely based on the recently observed data points (e.g., data collected over the last hour/day/month), which makes the problem even more challenging. In this paper, we propose to study the problem of tracking matrix approximations over distributed sliding windows. In this problem, there are m distributed sites each observing a stream of d-dimensional data points. The goal is to continuously track a small matrix B as an approximation to Aw, the matrix consists of data points in the union of the streams which arrived during the last W time units. The quality of the approximation is measured by the covariance error kAT wAw? BTBk = kAk2 F [1], and the primary goal is to minimize communication, while providing provable error guarantee. We propose novel communication-efficient algorithms for this problem. Our sampling-based algorithms continuously track a weighted sample of rows according to their squared norms, which generalize and simplify the sampling techniques in [2]. We also propose deterministic tracking algorithms that require only one-way communication and provide better error guarantee. All algorithms have provable guarantees, and extensive experimental studies on real and synthetic datasets validate our theoretical claims and demonstrate the efficiency of these algorithms.
What problem does this paper attempt to address?