M4: A Framework for Per-Flow Quantile Estimation

Siyuan Dong,Zhuochen Fan,Tianyu Bai,Tong Yang,Hanyu Xue,Peiqing Chen,Yuhan Wu
DOI: https://doi.org/10.1109/icde60146.2024.00364
2024-01-01
Abstract:The field of quantile estimation has grown in importance due to its myriad practical applications. Recent research trends have evolved from estimating the quantile for a single data stream to developing data structures that can concurrently estimate quantiles for multiple sub-streams, also known as flows. This paper introduces a novel framework, M4, designed to estimate per-flow quantiles in data streams accurately. M4 is a versatile framework that can be integrated with a wide array of single-flow quantile estimation algorithms, thereby enabling them to perform per-flow estimation. The framework employs a sketch-based approach to provide a space-efficient method for recording and extracting distribution information. M4 incorporates two techniques: MINIMUM and SUM. The MINIMUM technique minimizes the noise on a flow from other flows caused by hash collisions, while the SUM technique efficiently categorizes flows based on their sizes and customizes treatment strategies accordingly. We demonstrate the application of M4 on three single-flow quantile estimation algorithms (DDSketch, t-digest, and ReqSketch), detailing the specific implementation of the MINIMUM and SUM techniques. We provide theoretical proof that M4 delivers high accuracy while utilizing limited memory. Additionally, we conduct extensive experiments to evaluate the performance of M4 regarding accuracy and speed. The experimental results indicate that across all three example algorithms, M4 significantly outperforms two comparison frameworks in terms of accuracy for per-flow quantile estimation while maintaining comparable speed.
What problem does this paper attempt to address?