Abstract:A time series motif intuitively is a short time series that repeats itself approximately the same within a larger time series. Such motifs often represent concealed structures, such as heart beats in an ECG recording, the riff in a pop song, or sleep spindles in EEG sleep data. Motif discovery (MD) is the task of finding such motifs in a given input series. As there are varying definitions of what exactly a motif is, a number of different algorithms exist. As central parameters they all take the length l of the motif and the maximal distance r between the motif's occurrences. In practice, however, especially suitable values for r are very hard to determine upfront, and found motifs show a high variability even for very similar r values. Accordingly, finding an interesting motif requires extensive trial-and-error. In this paper, we present a different approach to the MD problem. We define k-Motiflets as the set of exactly k occurrences of a motif of length l, whose maximum pairwise distance is minimal. This turns the MD problem upside-down: The central parameter of our approach is not the distance threshold r, but the desired number of occurrence k of the motif, which we show is considerably more intuitive and easier to set. Based on this definition, we present exact and approximate algorithms for finding k-Motiflets and analyze their complexity. To further ease the use of our method, we describe statistical tools to automatically determine meaningful values for its input parameters. By evaluation on several real-world data sets and comparison to four SotA MD algorithms, we show that our proposed algorithm is both quantitatively superior to its competitors, finding larger motif sets at higher similarity, and qualitatively better, leading to clearer and easier to interpret motifs without any need for manual tuning.

A Framework for Discovering Variable-length Motifs in Medical Data Streams

LoCoMotif: discovering time-warped motifs in time series

Efficient Algorithms for Finding a Longest Common Increasing Subsequence

Discovering Leitmotifs in Multidimensional Time Series

Efficient Consensus Motif Discovery of All Lengths in Multiple Time Series

Exploring variable-length time series motifs in one hundred million length scale

Motiflets -- Simple and Accurate Detection of Motifs in Time Series

Admissible Time Series Motif Discovery with Missing Data

Novel algorithms for LDD motif search

An effective method to analyze variations of high-dimensional patterns over medical streams

Discovering Local Patterns From Multiple Temporal Sequences

Segmental semi-markov model based online series pattern detection under arbitrary time scaling

Pattern Recognition for Large-Scale and Incremental Time Series in Healthcare

An Advanced Segmental Semi-Markov Model Based Online Series Pattern Detection

Online Series Pattern Detection Based on Advanced Segmental Semi-Markov Model

A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences.

Exploring Scalable Parallelization for Edit Distance-Based Motif Search

MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis

Time series motifs discovery under DTW allows more robust discovery of conserved structure

Self-Organizing Maps with Variable Input Length for Motif Discovery and Word Segmentation

Mining Scalable Pattern Based on Temporal Logic over Data Streams