Continuously Extracting High-Quality Representative Set from Massive Data Streams.

Xiaokang Ji,Xiuli Ma,Ting Huang,Shiwei Tang
DOI: https://doi.org/10.1007/978-3-642-53914-5_8
2013-01-01
Abstract:In many large-scale real-time monitoring applications, hundreds or thousands of streams should be continuously monitored. To ease the monitoring a small set of representatives can be extracted to represent all the streams. To get a high-quality representative set, not only representativeness but also its stability should be guaranteed. In this paper, we propose a method to continuously extract high-quality representative set from massive streams. First, we cluster streams based on core clustering model. The tightness of core set, which means any two streams in core set are highly correlated, ensures high representativeness of representative set; second, we use topological relationship to force each cluster to be connected in the network where streams are generated from. Because streams in one cluster are driven by similar underlying mechanisms, so the representative set becomes much more stable. By utilizing the tightness of core sets, we can get representative set immediately. Moreover, with local optimization strategies, our method can adjust core clusters very efficiently, which enables real-time response. Experiments on real applications illustrate that our method is efficient and produces high-quality representative set. © Springer-Verlag 2013.
What problem does this paper attempt to address?