Abstract:Multivariate time series (MTS) analysis prevails in real-world applications such as finance, climate science and healthcare. The various self-attention mechanisms, the backbone of the state-of-the-art Transformer-based models, efficiently discover the temporal dependencies, yet cannot well capture the intricate cross-correlation between different features of MTS data, which inherently stems from complex dynamical systems in practice. To this end, we propose a novel correlated attention mechanism, which not only efficiently captures feature-wise dependencies, but can also be seamlessly integrated within the encoder blocks of existing well-known Transformers to gain efficiency improvement. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation. When combined with prevalent Transformer baselines, correlated attention mechanism constitutes a better alternative for encoder-only architectures, which are suitable for a wide range of tasks including imputation, anomaly detection and classification. Extensive experiments on the aforementioned tasks consistently underscore the advantages of correlated attention mechanism in enhancing base Transformer models, and demonstrate our state-of-the-art results in imputation, anomaly detection and classification.

What problem does this paper attempt to address?

The paper primarily aims to address issues in the analysis of Multivariate Time Series (MTS), particularly focusing on the limitations of existing Transformer-based models in capturing the complex cross-correlations between different features in MTS data. Specifically, the paper attempts to solve the following key problems: 1. **Improving the performance of Transformer models in non-predictive tasks**: Although existing Transformer-based models excel in capturing temporal dependencies, they have limitations in handling the cross-correlations between different features in MTS data. Therefore, the paper proposes a new correlation attention mechanism designed to efficiently capture the dependencies between features and seamlessly integrate into existing well-known Transformer models, thereby enhancing their performance in non-predictive tasks such as imputation, anomaly detection, and classification. 2. **Capturing lagged cross-correlations**: The paper emphasizes the importance of lagged cross-correlations in MTS data, a phenomenon where changes in one variable may only reflect in another variable after a certain delay. Despite the common occurrence of this phenomenon in practice, existing Transformer-based methods have not fully utilized this information to improve their performance in target applications. Hence, the proposed correlation attention mechanism is specifically designed to capture these lagged cross-correlations. 3. **Enhancing existing Transformer architectures**: By proposing a novel correlation attention mechanism, the paper aims to efficiently learn both immediate and lagged cross-correlations between different variables in MTS data. This mechanism can be seamlessly integrated with existing powerful Transformer models (such as Vanilla Transformer, Non-stationary Transformer, etc.) to enhance their performance. In summary, the core contribution of the paper lies in proposing a new correlation attention mechanism that not only effectively captures the cross-correlations between different features in MTS data but also significantly improves the performance of these models in various non-predictive tasks through integration with existing Transformer models.

Correlated Attention in Transformers for Multivariate Time Series

Foreformer: an Enhanced Transformer-Based Framework for Multivariate Time Series Forecasting

Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer

Generalizable Memory-driven Transformer for Multivariate Long Sequence Time-series Forecasting

Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism

VCformer: Variable Correlation Transformer with Inherent Lagged Correlation for Multivariate Time Series Forecasting

Replacing self-attentions with convolutional layers in multivariate long sequence time-series forecasting

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

LSEAttention is All You Need for Time Series Forecasting

MR-Transformer: Multiresolution Transformer for Multivariate Time Series Prediction

Attention as Robust Representation for Time Series Forecasting

Causal-Transformer: Spatial-temporal Causal Attention-Based Transformer for Time Series Prediction

Are Self-Attentions Effective for Time Series Forecasting?

FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification

Spatial-Temporal Convolutional Transformer Network for Multivariate Time Series Forecasting

Transformers with Sparse Attention for Granger Causality

Multi-scale convolution enhanced transformer for multivariate long-term time series forecasting

A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

DCT-Based Decorrelated Attention for Vision Transformers