Fast Similarity Matching on Data Stream with Noise.

Zou Peng,Su Liang,Jia Yan,Han WeiHong,Yang ShuQiang
DOI: https://doi.org/10.1109/icdew.2008.4498316
2008-01-01
Abstract:Data stream has attracted many researchers from various communities (network, database and data mining). There are a variety of techniques for solving the similarity matching in time series datasets. However, subsequence matching over data stream, finding those subsequences which are similar to a query sequence in a progressive and real-time fashion, is a challenging and novel problem due to the high speed, large quantity, potentially unbounded and evolving stream data. In this paper, firstly, we design a bound technique to prune the unnecessary computation as much as possible. Then, a novel algorithm is proposed which can identify all matched subsequences from data stream under the DTW (Dynamic Time Warping) distance in a "single pass". Furthermore, our experiments on synthetic and real data show that the proposed method is at least 3 times faster than the existing algorithm: SPRING, only increasing several extra bytes.
What problem does this paper attempt to address?