Abstract:Mining frequent sequences (FS) with constraints in a sequence database (SDB) are a critical task in Data Mining, as it forms the basis for discovering meaningful patterns within sequential data. However, traditional algorithms tackling the direct mining of constrained FSs from the SDB often exhibit poor performance, especially when dealing with large SDBs and low support thresholds. Moreover, constraint-based sequence mining algorithms face additional challenges, such as increased runtime and memory usage, particularly when constraints change frequently. To address these issues, this paper introduces an efficient method for generating FSs that include a user-defined sub-sequence. Specifically, the discovered FSs must be super-sequences of the given sub-sequence. Rather than directly discovering these sequences from a sequence database (SDB) in the traditional manner, the proposed method quickly generates constrained FSs from frequent closed sequences (FCS) and frequent generator sequences (FGS). This process involves categorizing constrained FSs into equivalence classes; each represented by FCSs and FGSs. An efficient method is then adapted to swiftly generate constrained FSs within each class based on the representative elements, which are FCSs and FGSs. Additionally, a novel technique called Constraint Satisfaction Technique (CST) is introduced to circumvent computationally expensive checks for the inclusion relation among sequences during the generation process. Furthermore, a novel algorithm named MFS-SubSC is developed based on the proposed theoretical results to generate all constrained FSs efficiently. Experimental results demonstrate that the proposed algorithm surpasses state-of-the-art methods in terms of runtime, memory usage, and scalability.

MFS-SubSC: an efficient algorithm for mining frequent sequences with sub-sequence constraint

Approximate mining of global closed frequent itemsets over data streams

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

Accelerated Frequent Closed Sequential Pattern Mining for Uncertain Data

An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

Mining Frequent Induced Subtree Patterns with Subtree-Constraint

CONTOUR: an Efficient Algorithm for Discovering Discriminating Subsequences

Efficient Algorithms for Finding a Longest Common Increasing Subsequence

Efficiently Mining Closed Subsequences with Gap Constraints

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

Maximal Frequent Item Sequences Mining of Datasets with Few Attributes and Large Instances

Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

Constraint-based sequence mining using constraint programming

Efficient Mining of Frequent Sequence Generators

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

Frequent Closed Sequence Mining Without Candidate Maintenance

Indexing And Mining Of The Local Patterns In Sequence Database

Mining Sequential Patterns with Constraints in Large Databases

Mvs-Match: An Efficient Subsequence Matching Approach Based On The Series Synopsis

Fast Utility Mining on Complex Sequences

Discriminating Subsequence Discovery for Sequence Clustering