Fast Rotation Kernel Density Estimation over Data Streams
Runze Lei,Pinghui Wang,Rundong Li,Peng Jia,Junzhou Zhao,Xiaohong Guan,Chao Deng
DOI: https://doi.org/10.1145/3447548.3467356
2021-01-01
Abstract:Kernel density estimation method is a powerful tool and is widely used in many important real-world applications such as anomaly detection and statistical learning. Unfortunately, current kernel methods suffer from high computational or space costs when dealing with large-scale, high-dimensional datasets, especially when the datasets of interest are given in a stream fashion. Although there are sketch methods designed for kernel density estimation over data streams, they still suffer from high computational costs. To address this problem, in this paper, we propose a novel Rotation Kernel. The Rotation Kernel is based on a Rotation Hash method and is much faster to compute. To achieve memory-efficient kernel density estimation over data streams, we design a method, RKD-Sketch, which compresses high dimensional data streams into a small array of integer counters. We conduct extensive experiments on both synthetic and real-world datasets, and experimental results demonstrate that our RKD-Sketch saves up to 216 times computational resources and up to 104 times space resources than state-of-the-arts. Furthermore, we apply our Rotation Kernel in active learning. Results show that our method achieves up to 256 times speedup and saves up to 13 times space to achieve the same accuracy as the baseline methods.