Time Series Data Cleaning Based on Dynamic Speed Constraints.

Guohui Ding,Chenyang Li,Ru Wei,Shasha Sun,Zhaoyu Liu,Chunlong Fan
DOI: https://doi.org/10.1007/978-3-030-62008-0_33
2020-01-01
Abstract:Errors are ubiquitous in time series data as sensors are often unstable. Existing approaches based on constraints can achieve good data repair effect on abnormal values. The constraint typically refers to the speed range of data changes. If the speed of data changes is not in the range, it is identified as abnormal data violating the constraint and needs repair, like if the oil consumption per hour of a sedan is negative or greater than 15 gallons, it is probably abnormal data. However, existing methods are only limited to specific type of data whose value change speed is stable. They will be inefficient when handling the data stream with sharp fluctuation because their constraints based on priori, fixed speed range might miss most abnormal data. To make up the gap in this scenario, an online cleaning approach based on dynamic speed constraints is proposed for time series data with fluctuating value change speed. The dynamic constraints proposed is not determined in advance but self-adaptive as data changes over time. A dual window mechanism is devised to transform the global optimum of data repair problem to local optimum problem. The classic minimum change principle and median principle are introduced for data repair. With respect to repair invalidation of minimum change principle facing consecutive data points violating constraints, we propose to use the boundary of the corresponding candidate repair set as repair strategy. Extensive experiments on real datasets demonstrate that the proposed approach can achieve higher repair accuracy than traditional approaches.
What problem does this paper attempt to address?