TSDDISCOVER: Discovering Data Dependency for Time Series Data

Xiaoou Ding,Yingze Li,Hongzhi Wang,Chen Wang,Yida Liu,Jianmin Wang
DOI: https://doi.org/10.1109/icde60146.2024.00282
2024-01-01
Abstract:Intelligent devices often produce time series data that suffer from significant data quality issues. While the utilization of data dependency in error detection and data repair has been somewhat beneficial, it remains inadequate in accurately representing the data quality of time series datasets. In recognition of the obvious characteristics inherent in time series data, we introduce a novel data dependency, termed TSDD. It effectively captures the contextual relationships embedded within multivariate time series, thereby enhancing the semantic richness of data quality representations. We analyze the complexity of both implication and consistency problems for TSDD reasoning, and develop TSDD discovery algorithm TSDDISCOVER, which consists of functional structure discovery, allowable error bound determination, and validation of TSDD patterns. Experimental results on real-life datasets verify TSDDISCOVER efficiently discovers high-quality TSDD patterns. In comparing the performance of TSDD-based error detection with several leading data quality constraints, our findings reveal that the former achieves an average improvement of 12% in accuracy and 30% in the F1 score over other dependency-based detection methods.
What problem does this paper attempt to address?