GinAR: An End-To-End Multivariate Time Series Forecasting Model Suitable for Variable Missing

Chengqing Yu,Fei Wang,Zezhi Shao,Tangwen Qian,Zhao Zhang,Wei Wei,Yongjun Xu
2024-05-19
Abstract:Multivariate time series forecasting (MTSF) is crucial for decision-making to precisely forecast the future values/trends, based on the complex relationships identified from historical observations of multiple sequences. Recently, Spatial-Temporal Graph Neural Networks (STGNNs) have gradually become the theme of MTSF model as their powerful capability in mining spatial-temporal dependencies, but almost of them heavily rely on the assumption of historical data integrity. In reality, due to factors such as data collector failures and time-consuming repairment, it is extremely challenging to collect the whole historical observations without missing any variable. In this case, STGNNs can only utilize a subset of normal variables and easily suffer from the incorrect spatial-temporal dependency modeling issue, resulting in the degradation of their forecasting performance. To address the problem, in this paper, we propose a novel Graph Interpolation Attention Recursive Network (named GinAR) to precisely model the spatial-temporal dependencies over the limited collected data for forecasting. In GinAR, it consists of two key components, that is, interpolation attention and adaptive graph convolution to take place of the fully connected layer of simple recursive units, and thus are capable of recovering all missing variables and reconstructing the correct spatial-temporal dependencies for recursively modeling of multivariate time series data, respectively. Extensive experiments conducted on five real-world datasets demonstrate that GinAR outperforms 11 SOTA baselines, and even when 90% of variables are missing, it can still accurately predict the future values of all variables.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Multivariate Time - Series Forecasting (MTSF) in the presence of missing variables**. Specifically, traditional Spatio - Temporal Graph Neural Networks (STGNNs) usually rely on complete historical data to mine spatio - temporal dependencies when dealing with multivariate time - series forecasting. However, in practical applications, due to reasons such as data - collecting device failures or maintenance, it is difficult to obtain complete historical observation data for all variables, resulting in missing data for some variables. In this case, STGNNs can only utilize some normal variables and are prone to incorrect spatio - temporal dependency relationship modeling, thus affecting the forecasting performance. To solve this problem, the paper proposes a new model - **Graph Interpolation Attention Recursive Network (GinAR)**. This model can recover missing variables and reconstruct correct spatio - temporal dependency relationships through the interpolation attention mechanism and adaptive graph convolution in the case of missing some variables, thereby achieving more accurate multivariate time - series forecasting. ### Main problem summary: 1. **Data missing problem**: In reality, multivariate time - series data often has long - term missing of some variables, which makes it difficult for traditional STGNNs to model effectively. 2. **Spatio - temporal dependency relationship modeling**: When some variables are missing, existing STGNNs are prone to capturing incorrect spatio - temporal dependency relationships, leading to a decline in forecasting performance. ### Solutions: - **Interpolation Attention**: It is used to restore a reasonable representation of missing variables, avoid directly mining missing variables without valuable patterns, and thus correct the time - dependency relationship. - **Adaptive Graph Convolution**: It is used to reconstruct the spatial correlations among all variables and ensure accurate modeling even in the case of missing some variables. Through these methods, GinAR can still accurately predict future values even when up to 90% of the variables are missing.