SSS: an Accurate and Fast Algorithm for Finding Top-k Hot Items in Data Streams

Junzhi Gong,Deyu Tian,Dongsheng Yang,Tong Yang,Tuo Dai,Bin Cui,Xiaoming Li
DOI: https://doi.org/10.1109/bigcomp.2018.00024
2018-01-01
Abstract:Finding top-k hot items in a data stream is a critical problem in big data management. It benefits various kinds of applications, such as data mining, databases, network traffic measurement, etc. However, as the speed of data streams become increasingly large, it becomes more and more challenging to design an accurate and fast algorithm for this problem. There are several existing algorithms, including Space-Saving, Frequent, Lossy counting, with Space-Saving being the most widely used among them. Unfortunately, all these existing algorithms cannot achieve high memory efficiency and high accuracy at the same time. In this paper, we propose an enhanced algorithm, named Scoreboard Space-Saving (SSS), which not only achieves much higher accuracy, but also works at fast and constant speed. The key idea of SSS is to predict whether each incoming item is a hot item or not by scoring. Experimental results show that SSS algorithm achieves up to 62.4 times higher accuracy than Space-Saving.
What problem does this paper attempt to address?