Efficient Row Pattern Matching Using Pattern Hierarchies for Sequence OLAP

Yuya Nasu,Hiroyuki Kitagawa,Kosuke Nakabasami
DOI: https://doi.org/10.1007/978-3-030-27520-4_7
2019-01-01
Abstract:Sequence OLAP is a variant of OLAP for sequence data analysis such as analysis of RFID log and person trip data. It extracts pattern occurrences of the given patterns (e.g., state transition pattern S1→S2documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$S_1 ightarrow S_2$$end{document}, movement pattern A→Bdocumentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$A ightarrow B$$end{document}) on sequence data and executes multi-dimensional aggregate using OLAP operations (such as drill-down and roll-up) and pattern OLAP operations (such as pattern-drill-down and pattern-roll-up). The pattern OLAP operations are specific to Sequence OLAP and involve a hierarchy of multiple patterns. When sequence data is stored in relational databases as sequences of rows, row pattern matching finds all subsequences of rows which match a given pattern. To do Sequence OLAP, especially pattern OLAP operations, on relational databases, it is required to execute row pattern matching for such a hierarchy of multiple patterns and identify parent-child relationships among pattern occurrences. Generally, row pattern matching needs sequential scan of a large table and is an expensive operation. If row pattern matching is executed individually for each pattern, it is very time consuming. Therefore, it is strongly demanded to execute multiple row pattern matching for a given hierarchy of patterns efficiently. This paper formalizes a pattern hierarchy model for Sequence OLAP and proposes a very efficient algorithm to do multiple row pattern matching using SP-NFA (Shared Prefix Nondeterministic Finite Automaton). In experiments, we implement our algorithm in PostgreSQL and evaluate the effectiveness of the proposal.
What problem does this paper attempt to address?