Graphformer: Adaptive graph correlation transformer for multivariate long sequence time series forecasting

Yijie Wang,Hao Long,Linjiang Zheng,Jiaxing Shang
DOI: https://doi.org/10.1016/j.knosys.2023.111321
IF: 8.139
2024-01-04
Knowledge-Based Systems
Abstract:Accurate long sequence time series forecasting (LSTF) remains a key challenge due to its complex time-dependent nature. Multivariate time series forecasting methods inherently assume that variables are interrelated and that the future state of each variable depends not only on its history but also on other variables. However, most existing methods, such as Transformer, cannot effectively exploit the potential spatial correlation between variables. To cope with the above problems, we propose a Transformer-based LSTF model, called Graphformer, which can efficiently learn complex temporal patterns and dependencies between multiple variables. First, in the encoder's self-attentive downsampling layer, Graphformer replaces the standard convolutional layer with an dilated convolutional layer to efficiently capture long-term dependencies between time series at different granularity levels. Meanwhile, Graphformer replaces the self-attention mechanism with a graph self-attention mechanism that can automatically infer the implicit sparse graph structure from the data, showing better generality for time series without explicit graph structure and learning implicit spatial dependencies between sequences. In addition, Graphformer uses a temporal inertia module to enhance the sensitivity of future time steps to recent inputs, and a multi-scale feature fusion operation to extract temporal correlations at different granularity levels by slicing and fusing feature maps to improve model accuracy and efficiency. Our proposed Graphformer can improve the long sequence time series forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
computer science, artificial intelligence
What problem does this paper attempt to address?
The paper primarily aims to address the key challenges in Long Sequence Time Series Forecasting (LSTF), particularly the complex temporal dependencies and potential spatial correlations in multivariate time series data. The proposed method is named Graphformer, which is a model based on the Transformer architecture designed to effectively capture long-term dependencies and spatial dependencies in multivariate time series. Specifically, Graphformer addresses the problem through the following points: 1. **Improvement of Self-Attention Mechanism**: In the self-attention down-sampling layer of the encoder, Graphformer replaces the standard convolution layer with dilated causal convolution to efficiently capture long-term dependencies between time series at different granularity levels. 2. **Graph Self-Attention Mechanism**: Graphformer introduces a graph self-attention mechanism that can automatically infer implicit sparse graph structures from the data, which is particularly useful for time series without explicit graph structures, and can learn implicit spatial dependencies between sequences. 3. **Multi-Scale Feature Fusion**: To extract time dependencies at different scale levels, Graphformer designs a multi-scale feature fusion operation by slicing the feature map and fusing it through a multi-scale pyramid network to capture cross-scale feature information. 4. **Temporal Inertia Module**: Graphformer also uses a temporal inertia module to enhance the sensitivity of future time steps to recent inputs, which helps improve the accuracy of the model. 5. **Overall Architecture**: Graphformer follows an encoder-decoder structure, where the encoder stacks three multi-head sparse graph self-attention modules, two self-attention down-sampling modules, and a multi-scale feature fusion layer; the decoder includes a masked sparse graph self-attention module and a sparse graph self-attention module. Through these methods, Graphformer can significantly improve the accuracy of long sequence time series forecasting while maintaining low computational costs, especially when compared to existing Transformer-based models. Experimental results show that Graphformer outperforms the state-of-the-art on real-world datasets in multiple domains such as energy, transportation, and disease.