Scalable Overspeed Item Detection in Streams

Yuhan Wu,Hanbo Wu,Chengjun Jia,Bo Peng,Ziyun Zhang,Tong Yang,Peiqing Chen,Kaicheng Yang,Bin Cui
DOI: https://doi.org/10.1109/icde60146.2024.00094
2024-01-01
Abstract:In data stream mining, monitoring high-speed users and segregating their excessive use, known as “Overspeed items,” is crucial for preventing system overload and maintaining fairness in messaging and network systems. Current approaches, however, face scalability challenges with large user bases, primarily due to increasing memory requirements proportional to user numbers. We have pinpointed the inefficiency in allocating memory for all users, recognizing that only a small fraction exhibit overspeed behavior at any given time. Addressing this, we employed the sketching technique, a type of approximate algorithm, and designed the first sketch algorithm for finding Overspeed items, named SpeedSketch: (1) Scalability. SpeedSketch can scale user numbers (saving memory space) to a factor of 6430 while maintaining a low average error rate of 0.1% in real-world datasets. (2) Accuracy. In theory, SpeedSketch stands out as the only sketch algorithm offering a per-user relative error bound. (3) Speed. SpeedSketch is implemented on a high-speed programmable switch with a throughput capacity of 4.8 billion items per second. All codes are available on GitHub for reference.
What problem does this paper attempt to address?