CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space

Haoquan Guan,Ziling Chen,Shaoxu Song
DOI: https://doi.org/10.14778/3611479.3611491
IF: 2.5
2023-07-01
Proceedings of the VLDB Endowment
Abstract:Median absolute deviation (MAD), the median of the absolute deviations from the median, has been found useful in various applications such as outlier detection. Together with median, MAD is more robust to abnormal data than mean and standard deviation (SD). Unfortunately, existing methods return only approximate MAD that may be far from the exact one, and thus mislead the downstream applications. Computing exact MAD is costly, however, especially in space, by storing the entire dataset in memory. In this paper, we propose COnstruction-REfinement Sketch (CORE-Sketch) for computing exact MAD. The idea is to construct some sketch within limited space, and gradually refine the sketch to find the MAD element, i.e., the element with distance to the median exactly equal to MAD. Mergeability and convergence of the method is analyzed, ensuring the correctness of the proposal and enabling parallel computation. Extensive experiments demonstrate that CORE-Sketch achieves significantly less space occupation compared to the aforesaid baseline of No-Sketch, and has time and space costs relatively comparable to the DD-Sketch method for approximate MAD.
computer science, information systems, theory & methods
What problem does this paper attempt to address?