OrderSketch: An Unbiased and Fast Sketch for Frequency Estimation of Data Streams

Lu Jie,Chen Hongchang,Sun Penghao,Hu Tao,Zhang Zhen
DOI: https://doi.org/10.1016/j.comnet.2021.108563
IF: 5.493
2021-12-01
Computer Networks
Abstract:Estimating the frequency of each distinct item in data streams is a fundamental problem in data mining. The speed of existing algorithms is not fast enough, and at the same time, some algorithms improve accuracy through complex configuration, which is a heavy burden for users. To address this issue, we propose a new sketch, OrderSketch, which has a simple structure and operation that is effortless to understand and use. The OrderSketch is significantly faster than existing algorithms while maintaining high accuracy. We theoretically prove that OrderSketch can provide unbiased estimation and then give an error bound of our algorithm. To verify the effectiveness and efficiency of OrderSketch, we compare it with five other widely used and excellent performance algorithms. Experimental results show that OrderSketch has 3 times higher insertion speed compared with the state-of-the-art work. We have released our source codes at Github [1].
computer science, information systems,telecommunications,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?