Finding Good-Quality Surprising Patterns in Time Series Data
Aiguo Li,Zhanhuai Li
DOI: https://doi.org/10.3969/j.issn.1000-2758.2007.03.022
2007-01-01
Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
Abstract:Aim. Previous methods for finding surprising patterns in time series data suffer, in our opinion, three shortcomings: (1) they used very limited shape features of the time series data, (2) they ignored the statistical features of the time series data, and (3) they did not realize that utilizing suitable models can reduce the number of subsequences that have surprising patterns. We now present what we believe to be a better method. In the full paper, we explain our method in detail. In this abstract, we just add some pertinent remarks to the two topics of explanation: (1) the formal description of surprising pattern, (2) the algorithm for finding surprising patterns. In the first topic, we give a theorem and its proof and also five definitions. The three subtopics of the second topic are: the algorithm proposed by us (subtopic 2.1), the determination of the threshold values (subtopic 2.2), and the analysis of the computing complexity of the proposed algorithm (subtopic 2.3). In the second topic, we give a five-step flowchart, based on the theorem in the first topic, for finding surprising patterns. Most importantly, in subtopic 2.1, we explain the suitable modeling that reduces the number of subsequences that have surprising patterns. The algorithm achieves a rate of data compression about 32:1 or 64:1; so, it can be used in massive time series databases. The experimental results, given in a figure in the full paper, demonstrate preliminarily that the proposed method can not only find surprising patterns defined by Keogh et al[1] but also omit those surprising patterns in the time series data that are not really surprising through suitable modeling.