TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

Sabera Talukder,Yisong Yue,Georgia Gkioxari
2024-02-26
Abstract:The field of general time series analysis has recently begun to explore unified modeling, where a common architectural backbone can be retrained on a specific task for a specific dataset. In this work, we approach unification from a complementary vantage point: unification across tasks and domains. To this end, we explore the impact of discrete, learnt, time series data representations that enable generalist, cross-domain training. Our method, TOTEM, or TOkenized Time Series EMbeddings, proposes a simple tokenizer architecture that embeds time series data from varying domains using a discrete vectorized representation learned in a self-supervised manner. TOTEM works across multiple tasks and domains with minimal to no tuning. We study the efficacy of TOTEM with an extensive evaluation on 17 real world time series datasets across 3 tasks. We evaluate both the specialist (i.e., training a model on each domain) and generalist (i.e., training a single model on many domains) settings, and show that TOTEM matches or outperforms previous best methods on several popular benchmarks. The code can be found at:
Machine Learning
What problem does this paper attempt to address?
The main goal of this paper is to propose a universal time series analysis method that can work across different tasks and data domains without the need for specialized tuning or training for each specific dataset. Specifically, the paper introduces a method called TOTEM (TOkenized Time Series EMbeddings), which is a vector quantized autoencoder (VQ-VAE) based architecture for generating discrete time series embedding representations. ### Main Contributions: 1. **TOTEM Architecture**: A simple yet powerful time series tokenization architecture is proposed, which performs well across different tasks and data domains with minimal adjustment required. 2. **Performance**: Despite its simplicity, TOTEM performs comparably or even better than existing state-of-the-art methods on multiple popular benchmark datasets. 3. **Zero-shot Generalization**: Through extensive experimental evaluation, TOTEM not only excels in in-domain testing but also surpasses leading existing methods in zero-shot testing under a universal setting (i.e., training a single model across multiple data domains). ### Technical Details: - **Data Engineering**: TOTEM avoids complex data preprocessing steps and operates directly on time steps, making it suitable for time series data with different sampling rates. - **VQ-VAE Architecture**: TOTEM employs a special VQ-VAE architecture that creates non-overlapping time series tokens, and this architecture is applicable to time series of different lengths, sensor counts, and task types. - **Downstream Tasks**: TOTEM can be applied to various time series analysis tasks, including imputation, anomaly detection, and forecasting. For forecasting tasks, additional modeling components such as Transformer encoders are required. ### Experimental Results: - In the imputation task, TOTEM as a specialized training model had the highest average number of wins (52.1%) and also significantly outperformed other models in a universal training setting. - In the anomaly detection task, TOTEM also performed well, having the highest average number of wins (33.3%) in a specialized training setting and significantly outperforming competitors in both in-domain and zero-shot tests under a universal setting. In summary, TOTEM aims to provide a unified framework that can effectively handle various time series analysis tasks and generalize across different data domains.