Abstract:Given a data matrix D , a submatrix S of D is an order-preserving submatrix (OPSM) if there is a permutation of the columns of S , under which the entry values of each row in S are strictly increasing. OPSM mining is widely used in real-life applications such as identifying coexpressed genes, and finding customers with similar preference. However, noise is ubiquitous in real data matrices due to variable experimental conditions and measurement errors, which makes conventional OPSM mining algorithms inapplicable. No previous work has ever combated uncertain value intervals using the well-established possible world semantics . We establish two different definitions of significant OPSMs based on the possible world semantics : (1) expected support based and (2) probabilistic frequentness based. An optimized dynamic programming approach is proposed to compute the probability that a row supports a particular column permutation, with a closed-form formula derived to efficiently handle the special case of uniform value distribution, and an accurate cubic spline approximation approach that works well with any uncertain value distributions. To efficiently check the probabilistic frequentness, several effective pruning rules are designed to efficiently prune insignificant OPSMs; two approximation techniques based on the Poisson and Gaussian distributions, respectively, are proposed for further speedup. These techniques are integrated into our two OPSM mining algorithms, based on prefix-projection and Apriori, respectively. We further parallelize our prefix-projection based mining algorithm using PrefixFPM, a recently proposed framework for parallel frequent pattern mining, and we achieve a good speedup with the number of CPU cores. Extensive experiments on real microarray data demonstrate that the OPSMs found by our algorithms have a much higher quality than those found by existing approaches.

Scalable Order-Preserving Pattern Mining

OPP-Miner: Order-Preserving Sequential Pattern Mining for Time Series

OPR-Miner: Order-preserving rule mining for time series

Approximate Order-Preserving Pattern Mining for Time Series

AOP-Miner: Approximate Order-Preserving Pattern Mining for Time Series

Mining Order-Preserving Submatrices Based On Frequent Sequential Pattern Mining

Co-occurrence order-preserving pattern mining

Co-occurrence Order-Preserving Pattern Mining with Keypoint Alignment for Time Series

Order-preserving pattern mining with forgetting mechanism

Mining Frequent Ordered Patterns without Candidate Generation

Top-k contrast order-preserving pattern mining

A new approach for mining deep order-preserving submatrices

A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences

FastOPM - a Practical Method for Partial Match of Time Series

COPP-Miner: Top-K Contrast Order-Preserving Pattern Mining for Time Series Classification

Towards Order-Preserving Submatrix Search And Indexing

Mining Frequent Ordered Patterns

Near-optimal Top-k Pattern Mining

A New Algorithm for Mining Maximal Frequent Patterns

Mining Scalable Pattern Based on Temporal Logic over Data Streams

Mining Order-Preserving Submatrices Under Data Uncertainty: A Possible-World Approach and Efficient Approximation Methods