InfoMiner+: Mining Partial Periodic Patterns with Gap Penalties.

J Yang,W Wang,PS Yu
DOI: https://doi.org/10.1109/icdm.2002.1184039
2002-01-01
Abstract:In this paper we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for deviation from a pattern. However, it does not involve a penalty if there exists some gap between pattern occurrences. In many applications, e.g., bioinformatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure generalized information gain. Furthermore, we need to find a subsequence S' such that for a pattern P, the generalized information gain of P in S' is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and associated subsequences.
What problem does this paper attempt to address?