Mining Top-K Frequent and Flexible Pattern from Sequences

Zhang Junyan,Min Fan
DOI: https://doi.org/10.1109/anthology.2013.6784804
2013-01-01
Abstract:Pattern Mining is a popular issue in biological sequence analysis. With the introduction of wildcard gaps, more interesting patterns can be mined. In this paper, we propose a new definition related to pattern frequency, under which the Apriori property holds. We define a pattern mining problem called Ming top-K Frequent Patterns (MFP), where gaps are mined instead of specified. Compared with existing problems, MFP does not require any domain knowledge of the user. However, theoretical analysis and experimental results show that MFP favors inflexible patterns. We then define another problem where the flexibility threshold of each gap is specified by the user. The problem is called Mining top-K Frequent and Flexible Patterns (MF2P). We develop algorithm with polynomial complexities for both problems. Patterns can grow from both sides. Some interesting biological patterns mined by our algorithms are discussed.
What problem does this paper attempt to address?