Incremental Temporal Frequent Pattern Mining Based on Spark Streaming

Yijian Zhao,Fang Huang,Shaoyong Wang,Keqiang Yu,Chengyuan Zhang
DOI: https://doi.org/10.1109/ihmsc49165.2020.10084
2020-01-01
Abstract:As we are mining streaming data using frequent pattern, the accumulation of time is a huge factor in influencing the relationship between data items. In this case, how to keep track of the historical information of streaming data efficiently and design a temporal frequent measurement which considers time accumulation are two critical problems of frequent pattern for mining streaming data. As a result, we bring out a scheme of incremental temporal frequent pattern mining based on spark streaming framework. More specifically, as for the time property of streaming data, we design a calculating approach of temporal frequency which could decay and increase along with time. Simultaneously, as for the property of information accumulation, we put forward the ITFP (Incremental Temporal Frequent Pattern mining) algorithm. In order to minimize the spatial cost of recording historical information, this algorithm draws into TFP-tree (Time Frequent Pattern tree) to save historical frequent patterns, and reconstructs TFP-tree to achieve incremental mining. Additionally, a temporal sub-frequent pattern is proposed as backtracking window of historical information to reduce the deviation of frequent pattern. Finally, we test and verify above mentioned approaches through using public paper data set. And the experimental results show that the ITFP mining based on spark streaming has better performance of accuracy and validity, and even better extendibility under the distributed circumstance.
What problem does this paper attempt to address?