Anomaly Detection in Hierarchical Data Streams under Unknown Models

Sattar Vakili,Qing Zhao,Chang Liu,Chen-Nee Chuah
DOI: https://doi.org/10.48550/arXiv.1709.03573
IF: 5.414
2017-09-11
Machine Learning
Abstract:We consider the problem of detecting a few targets among a large number of hierarchical data streams. The data streams are modeled as random processes with unknown and potentially heavy-tailed distributions. The objective is an active inference strategy that determines, sequentially, which data stream to collect samples from in order to minimize the sample complexity under a reliability constraint. We propose an active inference strategy that induces a biased random walk on the tree-structured hierarchy based on confidence bounds of sample statistics. We then establish its order optimality in terms of both the size of the search space (i.e., the number of data streams) and the reliability requirement. The results find applications in hierarchical heavy hitter detection, noisy group testing, and adaptive sampling for active learning, classification, and stochastic root finding.
What problem does this paper attempt to address?