DyGraphformer: Transformer combining dynamic spatio-temporal graph network for multivariate time series forecasting

Shuo Han,Yaling Xun,Jianghui Cai,Haifeng Yang,Yanfeng Li
DOI: https://doi.org/10.1016/j.neunet.2024.106776
2024-10-17
Abstract:Transformer-based models demonstrate tremendous potential for Multivariate Time Series (MTS) forecasting due to their ability to capture long-term temporal dependencies by using the self-attention mechanism. However, effectively modeling the spatial correlation cross series for MTS is a challenge for Transformer. Although Graph Neural Networks (GNN) are competent for modeling spatial dependencies across series, existing methods are based on the assumption of static relationships between variables, which do not align with the time-varying spatial dependencies in real-world series. Therefore, we propose DyGraphformer, which integrates graph convolution into Transformer to assist Transformer in effectively modeling spatial dependencies, while also dynamically inferring time-varying spatial dependencies by combining historical spatial information. In DyGraphformer, decoder module involving complex recursion is abandoned to accelerate model execution. First, the input is embedded using DSW (Dimension Segment Wise) through integrating its position and node level embedding to preserve temporal and spatial information. Then, the time self-attention layer and dynamic graph convolutional layer are constructed to capture temporal dependencies and spatial dependencies of multivariate time series, respectively. The dynamic graph convolutional layer utilizes Gated Recurrent Unit (GRU) to obtain historical spatial dependencies, and integrates the series features of the current time to perform graph structure inference in multiple subspaces. Specifically, to fully utilize the spatio-temporal information at different scales, DyGraphformer performs hierarchical encoder learning for the final forecasting. Extensive experimental results on seven real-world datasets demonstrate DyGraphformer outperforms state-of-the-art baseline methods, with comparisons including Transformer-based and GNN-based methods.
What problem does this paper attempt to address?