Time Series Data Cleaning under Expressive Constraints on Both Rows and Columns

Xiaoou Ding,Genglong Li,Hongzhi Wang,Chen Wang,Yichen Song
DOI: https://doi.org/10.1109/icde60146.2024.00283
2024-01-01
Abstract:Time series data generated by thousands of sensors are suffering data quality problems. Traditional constraint-based techniques have greatly contributed to data cleaning applications. However, cleaning methods that support expressive constraints on time series data remain insufficient. Given the notable characteristics of time series data, existing cleaning approaches are challenged to provide good repair solutions. To address the challenges, we propose a novel data cleaning method for time series which incorporates expressive constraints that support arithmetic operations between attributes and time context. In the violation detection phase, we introduce specialized violation degree quantification functions and design a violation cell discovery algorithm to identify errors hidden in time series data. In the data repairing phase, we formalize the cleaning task as a constrained optimization problem and develop a novel repair objective function that considers both modification costs and conformance degrees of constraints. We effectively reduce the repair search space through the evaluation of time-context constraints and propose a bidirectional repairing algorithm. We also provide theoretical analysis of the proposed repairing method. Experimental results on three real-world IoT datasets across five metrics demonstrate that our proposed method outperforms seven state-of-the-art cleaning techniques specialized for time series data. Specifically, we achieve a 60% improvement in repairing effectiveness and a 70% reduction in time costs with our designed cleaning strategy.
What problem does this paper attempt to address?