Discovering Significant Sequential Patterns in Data Stream by an Efficient Two-Phase Procedure

Huijun Tang,Le Wang,Yangguang Liu,Jiangbo Qian
DOI: https://doi.org/10.1155/2022/5379086
IF: 1.43
2022-01-01
Mathematical Problems in Engineering
Abstract:One essential topic of mining sequential patterns in the data stream is to optimize the time-space computations. However, more importantly, it should pay more attention to the significance of mining results as a large portion of them just response to the user-defined constraints purely by accident and they may have no statistical significance. In this paper, we propose FSSPDS, an efficient two-phase algorithm to discover the significant sequential patterns (SSPs) in the data stream with typical sliding windows, which has never been considered in existing problems. First, for generating SSPs candidates with high-quality, FSSPDS takes testable support and pattern length constraints into account and insignificant patterns were removed timely by a pattern-growth method. In the second phase, appropriate permutation testing is used to test the significance of the SSPs candidates. Exact permutation p values are obtained in a novel combination way based on unconditional Barnard’s test statistic which better reflects the process of data generations and collections. Experimental evaluations show that FSSPDS allows the discovery of SSPs in the data stream and rivals the state-of-the-art approaches efficiently under the control of family-wise error rate (FWER), especially for time efficiency, which was approximately an order of magnitude higher.
What problem does this paper attempt to address?