Distance-Based Outlier Query Optimization in Apache IoTDB

Yunxiang Su,Shaoxu Song,Xiangdong Huang,Chen Wang,Jianmin Wang
DOI: https://doi.org/10.14778/3681954.3681962
IF: 2.5
2024-07-01
Proceedings of the VLDB Endowment
Abstract:While outlier detection has been widely studied over streaming data, the query of outliers in time series databases was largely overlooked. Apache IoTDB, an open-source time series database, employs LSM-tree based storage to support intensive writing workloads, yet this storage structure unfortunately encumbers the outlier query performing. In the system, data points of a time series may be stored in multiple files with overlapping time ranges, owing to the far delayed data arrivals, which are simply discarded in streaming outlier detection. Given the overlapping time ranges, it is not able to detect outliers in each file and merge them as the results. In this paper, we focus on optimizing the efficiency of distance-based outlier query in Apache IoTDB, with the consideration of overlapping files for delayed data. We propose to utilize bucket statistics of the values stored in files. Upper and lower bounds on the neighbor counts of data points are derived in buckets and overlapping files for efficient pruning. Extensive experiments demonstrate the efficiency of our proposal in the LSM-tree based time series database, Apache IoTDB, compared to the existing outlier detection methods designed for data streams.
computer science, information systems, theory & methods
What problem does this paper attempt to address?