Abstract:Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object's similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2(N) - 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data.

Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data

Mining Spread Patterns of Spatio-temporal Co-occurrences over Zones

Accelerated Frequent Closed Sequential Pattern Mining for Uncertain Data

OPP-Miner: Order-Preserving Sequential Pattern Mining for Time Series

HAOP-Miner: Self-adaptive high-average utility one-off sequential pattern mining

Anomaly Rule Detection in Sequence Data

A Two-Phase Approach for Unexpected Pattern Mining.

HANP-Miner: High average utility nonoverlapping sequential pattern mining

Towards Top-$K$ Non-Overlapping Sequential Patterns

Fast Utility Mining on Complex Sequences

On-shelf Utility Mining of Sequence Data

MWFP-outlier: Maximal weighted frequent-pattern-based approach for detecting outliers from uncertain weighted data streams

An Outlier Detection Scheme for Dynamical Sequential Datasets.

Repetitive nonoverlapping sequential pattern mining

Mining long sequential patterns in a noisy environment.

On Top-K Closed Sequential Patterns Mining

Self-adaptive nonoverlapping sequential pattern mining

ONP-Miner: One-off Negative Sequential Pattern Mining

Discovering Significant Sequential Patterns in Data Stream by an Efficient Two-Phase Procedure

Scalable Order-Preserving Pattern Mining

UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams