Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning

Jiewen Deng,Renhe Jiang,Jiaqi Zhang,Xuan Song

2024-05-06

Abstract:Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST forecasting is more challenging because it possesses (i) high-dimensional and complex internal structures and (ii) dynamic heterogeneity caused by temporal, spatial, and modality variations. In this study, we propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL, which aims to uncover latent patterns from temporal, spatial, and modality perspectives while quantifying dynamic heterogeneity. Experiment results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines. Model implementation is available at

Machine Learning

What problem does this paper attempt to address?

This paper focuses on the prediction problem of multi-modal spatio-temporal (MoST) data, which is widely present in monitoring systems in the real world, such as different transportation demands and air quality assessment. Compared to ordinary spatio-temporal data, MoST data contains additional modal information, which increases the complexity and challenges of prediction due to its high dimensionality, complex internal structure, and dynamic heterogeneity caused by temporal, spatial, and modal variations. The paper proposes a new MoST learning framework called MoSSL (Multi-Modality Spatio-Temporal Learning via Self-Supervised Learning) to explore latent patterns and quantify dynamic heterogeneity from the perspectives of time, space, and modality. MoSSL consists of four main parts: (1) MoST encoder for capturing spatial, temporal, and modal information; (2) multi-modal data augmentation to understand pattern correlations and integrate MoST domain information; (3) Global Self-Supervised Learning (GSSL) to identify diverse pattern changes from different perspectives; (4) Modal Self-Supervised Learning (MSSL) to further enhance the learning representation of inter-modal and intra-modal features. Experiments on two real-world MoST datasets have verified the superiority of MoSSL, demonstrating its better performance compared to existing state-of-the-art baseline models in traffic flow and air quality prediction tasks. In addition, the paper conducts ablation studies to demonstrate the contributions of key components of MoSSL to performance, and showcases the effects of modal augmentation and heterogeneity decoupling through case studies. In summary, the paper attempts to address the problem of effectively utilizing self-supervised learning to handle the prediction of multi-modal spatio-temporal data, by capturing and quantifying heterogeneity in different dimensions, and improving the accuracy and comprehensiveness of prediction.

Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning

Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos

Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction

Spatial-temporal Forecasting for Regions without Observations

Wavelet-Driven Spatiotemporal Predictive Learning: Bridging Frequency and Time Variations

Rethinking self-supervised learning for time series forecasting: A temporal perspective

STEMO: Early Spatio-temporal Forecasting with Multi-Objective Reinforcement Learning

Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale

CMS-LSTM: Context Embedding and Multi-Scale Spatiotemporal Expression LSTM for Predictive Learning

AutoSTL: Automated Spatio-Temporal Multi-Task Learning.

Towards Effective Fusion and Forecasting of Multimodal Spatio-temporal Data for Smart Mobility

Machine Learning for Spatiotemporal Sequence Forecasting: A Survey

Deep Spatial Prediction via Heterogeneous Multi-Source Self-Supervision

Is Single Enough? A Joint Spatiotemporal Feature Learning Framework for Multivariate Time Series Prediction

STJLA: A Multi-Context Aware Spatio-Temporal Joint Linear Attention Network for Traffic Forecasting

STSD: Modeling Spatial Temporal Staticity and Dynamicity in Traffic Forecasting

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

Meta-STMF: Meta-Learning Based Spatial Temporal Prediction Model Fusion Approach

SA-JSTN: Self-Attention Joint Spatiotemporal Network for Temperature Forecasting

Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting