Finding Simplex Items in Data Streams.

Zhuochen Fan,Jiarui Guo,Xiaodong Li,Tong Yang,Yikai Zhao,Yuhan Wu,Bin Cui,Yanwei Xu,Steve Uhlig,Gong Zhang
DOI: https://doi.org/10.1109/icde55515.2023.00152
2023-01-01
Abstract:In this paper, we propose a new type of item in data streams, called simplex items. Simplex items have frequencies in consecutive p windows that can be approximated by a polynomial of degree at most k, where k = 0, 1, 2. These low-order representable simplex items have a wide range of potential applications. For example, when k = 1, we can leverage these items whose frequency has obvious linear increase or decrease to speed up the running time of a class of machine learning models and detect network attacks such as distributed denial-of-service (DDoS), etc. To find k-degree simplex items in real time, we propose a novel sketch, namely X-Sketch, to accurately record simplex items in a compact space. The key idea of X-Sketch is to effectively filter out non-simplex items with less memory overhead, and then monitor the remaining potential simplex items and keep those items with more consecutive windows. We conduct extensive experiments, and the experimental results show that the F1 Score of X-Sketch is on average 68.6%, 57.9%, and 42.2% higher than the baseline solution for k = 0, 1, 2, respectively. Finally, we also provide a case study that applies X-Sketch to "accelerate" the two machine learning models through end-to-end experiments. We have released our source code at GitHub.
What problem does this paper attempt to address?