Abstract:The field of general time series analysis has recently begun to explore unified modeling, where a common architectural backbone can be retrained on a specific task for a specific dataset. In this work, we approach unification from a complementary vantage point: unification across tasks and domains. To this end, we explore the impact of discrete, learnt, time series data representations that enable generalist, cross-domain training. Our method, TOTEM, or TOkenized Time Series EMbeddings, proposes a simple tokenizer architecture that embeds time series data from varying domains using a discrete vectorized representation learned in a self-supervised manner. TOTEM works across multiple tasks and domains with minimal to no tuning. We study the efficacy of TOTEM with an extensive evaluation on 17 real world time series datasets across 3 tasks. We evaluate both the specialist (i.e., training a model on each domain) and generalist (i.e., training a single model on many domains) settings, and show that TOTEM matches or outperforms previous best methods on several popular benchmarks. The code can be found at:

What problem does this paper attempt to address?

The main goal of this paper is to propose a universal time series analysis method that can work across different tasks and data domains without the need for specialized tuning or training for each specific dataset. Specifically, the paper introduces a method called TOTEM (TOkenized Time Series EMbeddings), which is a vector quantized autoencoder (VQ-VAE) based architecture for generating discrete time series embedding representations. ### Main Contributions: 1. **TOTEM Architecture**: A simple yet powerful time series tokenization architecture is proposed, which performs well across different tasks and data domains with minimal adjustment required. 2. **Performance**: Despite its simplicity, TOTEM performs comparably or even better than existing state-of-the-art methods on multiple popular benchmark datasets. 3. **Zero-shot Generalization**: Through extensive experimental evaluation, TOTEM not only excels in in-domain testing but also surpasses leading existing methods in zero-shot testing under a universal setting (i.e., training a single model across multiple data domains). ### Technical Details: - **Data Engineering**: TOTEM avoids complex data preprocessing steps and operates directly on time steps, making it suitable for time series data with different sampling rates. - **VQ-VAE Architecture**: TOTEM employs a special VQ-VAE architecture that creates non-overlapping time series tokens, and this architecture is applicable to time series of different lengths, sensor counts, and task types. - **Downstream Tasks**: TOTEM can be applied to various time series analysis tasks, including imputation, anomaly detection, and forecasting. For forecasting tasks, additional modeling components such as Transformer encoders are required. ### Experimental Results: - In the imputation task, TOTEM as a specialized training model had the highest average number of wins (52.1%) and also significantly outperformed other models in a universal training setting. - In the anomaly detection task, TOTEM also performed well, having the highest average number of wins (33.3%) in a specialized training setting and significantly outperforming competitors in both in-domain and zero-shot tests under a universal setting. In summary, TOTEM aims to provide a unified framework that can effectively handle various time series analysis tasks and generalize across different data domains.

TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces

Towards Generalisable Time Series Understanding Across Domains

TriD-MAE: A Generic Pre-trained Model for Multivariate Time Series with Missing Values

TS2Vec: Towards Universal Representation of Time Series

Time Series Representation Models

Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

Representation and analysis of time-series data via deep embedding and visual exploration

Large Pre-trained time series models for cross-domain Time series analysis tasks

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

Test Time Learning for Time Series Forecasting

Disentangling Domain and General Representations for Time Series Classification

XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal Expression Extraction

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

ExoTST: Exogenous-Aware Temporal Sequence Transformer for Time Series Prediction

MOMENT: A Family of Open Time-series Foundation Models

Time2graph: Revisiting Time Series Modeling With Dynamic Shapelets