MimoSketch: A Framework to Mine Item Frequency on Multiple Nodes with Sketches

Yuchen Xu,Wenfei Wu,Bohan Zhao,Tong Yang,Yikai Zhao
DOI: https://doi.org/10.1145/3580305.3599433
2023-01-01
Abstract:We abstract a MIMO scenario in distributed data stream mining, where a stream of multiple items is mined by multiple nodes. We design a framework named MimoSketch for the MIMO-specific scenario, which improves the fundamental mining task of item frequency estimation. MimoSketch consists of an algorithm design and a policy to schedule items to nodes. MimoSketch's algorithm applies random counting to preserve a mathematically proven unbiasedness property, which makes it friendly to the aggregate query on multiple nodes; its memory layout is dynamically adaptive to the runtime item size distribution, which maximizes the estimation accuracy by storing more items. MimoSketch's scheduling policy balances items among nodes, avoiding nodes being overloaded or underloaded, which improves the overall mining accuracy. Our prototype and evaluation show that our algorithm can improve the item frequency estimation accuracy by an order of magnitude compared with the state-of-the-art solutions, and the scheduling policy further promotes the performance in MIMO scenarios.
What problem does this paper attempt to address?