Abstract:With recent advances in building foundation models for texts and video data, there is a surge of interest in foundation models for time series. A family of models have been developed, utilizing a temporal auto-regressive generative Transformer architecture, whose effectiveness has been proven in Large Language Models. While the empirical results are promising, almost all existing time series foundation models have only been tested on well-curated ``benchmark'' datasets very similar to texts. However, real-world time series exhibit unique challenges, such as variable channel sizes across domains, missing values, and varying signal sampling intervals due to the multi-resolution nature of real-world data. Additionally, the uni-directional nature of temporally auto-regressive decoding limits the incorporation of domain knowledge, such as physical laws expressed as partial differential equations (PDEs). To address these challenges, we introduce the Time Diffusion Transformer (TimeDiT), a general foundation model for time series that employs a denoising diffusion paradigm instead of temporal auto-regressive generation. TimeDiT leverages the Transformer architecture to capture temporal dependencies and employs diffusion processes to generate high-quality candidate samples without imposing stringent assumptions on the target distribution via novel masking schemes and a channel alignment strategy. Furthermore, we propose a finetuning-free model editing strategy that allows the seamless integration of external knowledge during the sampling process without updating any model parameters. Extensive experiments conducted on a varity of tasks such as forecasting, imputation, and anomaly detection, demonstrate the effectiveness of TimeDiT.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address several key issues in time series analysis and proposes a general time series foundation model—TimeDiT (Time Diffusion Transformer). Specifically: 1. **Handling Diverse Time Series Data**: Existing foundational models for time series are typically tested only on carefully curated "benchmark" datasets, which are very similar to text data. However, real-world time series data present unique challenges, such as varying channel sizes across different domains, missing values, and multi-resolution characteristics. 2. **Unidirectionality Limitation of Autoregressive Decoding**: Existing models primarily use autoregressive generation methods, which limit the incorporation of prior knowledge such as physical laws expressed through partial differential equations (PDEs). 3. **Need for a Unified Framework**: Current methods lack a unified framework to handle diverse data inputs, often prioritizing performance on carefully curated datasets while neglecting the complexities of real-world scenarios. To address these issues, the paper proposes TimeDiT, a time series foundation model based on a diffusion transformer. TimeDiT leverages the Transformer architecture to capture temporal dependencies and employs a diffusion process to generate high-quality candidate samples without imposing strict assumptions on the target distribution. Additionally, TimeDiT introduces a model editing strategy that integrates external knowledge seamlessly during the sampling process without updating any model parameters. ### Main Contributions 1. **Introduction of the TimeDiT Model**: TimeDiT combines the advantages of diffusion models and Transformers, providing a flexible architecture that can adapt to various downstream tasks. The model includes a comprehensive masking mechanism to ensure a standardized training process capable of handling diverse input shapes and distributions. 2. **Addressing Real-World Challenges**: TimeDiT directly handles multivariate inputs and generates coherent target time series through a denoising process, effectively addressing issues such as missing values and multi-resolution data. Furthermore, TimeDiT can generate time series that conform to known physical laws and domain-specific requirements, enhancing its applicability in scientific and engineering fields. 3. **Extensive Experimental Validation**: TimeDiT achieves state-of-the-art or highly competitive performance across multiple domains and tasks, including probabilistic forecasting, imputation, anomaly detection, and data generation. TimeDiT performs excellently in both in-domain and zero-shot settings, demonstrating its effectiveness and efficiency as a foundational model. In summary, by combining the strengths of diffusion models and Transformers, TimeDiT offers a flexible and powerful solution capable of efficient application across various time series tasks.

TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

Non-autoregressive Conditional Diffusion Models for Time Series Prediction

Physically-guided Temporal Diffusion Transformer for Long-Term Time Series Forecasting

Diffusion-TS: Interpretable Diffusion for General Time Series Generation

DifFormer: Multi-Resolutional Differencing Transformer With Dynamic Ranging for Time Series Analysis

UTSD: Unified Time Series Diffusion Model

Dynamic Diffusion Transformer

VDT: General-purpose Video Diffusion Transformers via Mask Modeling

NTDformer: A Multi-Scale Forecasting Model for Non-Stationary Multilevel Time Series

TerDiT: Ternary Diffusion Models with Transformers

TS-Diffusion: Generating Highly Complex Time Series with Diffusion Models

A Novel Time Diffusion Model for Industrial Time Series Data Generation

Time-Transformer: Integrating Local and Global Features for Better Time Series Generation

Real-time Inference and Extrapolation via a Diffusion-inspired Temporal Transformer Operator (DiTTO)

Dateformer: Time-modeling Transformer for Longer-term Series Forecasting

DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection

Latent Diffusion Transformer for Probabilistic Time Series Forecasting

Toward a Foundation Model for Time Series Data

DiffiT: Diffusion Vision Transformers for Image Generation

TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing