Abstract:Operational weather forecasting system relies on computationally expensive physics-based models. Recently, transformer based models have shown remarkable potential in weather forecasting achieving state-of-the-art results. However, transformers are discrete and physics-agnostic models which limit their ability to learn the continuous spatio-temporal features of the dynamical weather system. We address this issue with STC-ViT, a Spatio-Temporal Continuous Vision Transformer for weather forecasting. STC-ViT incorporates the continuous time Neural ODE layers with multi-head attention mechanism to learn the continuous weather evolution over time. The attention mechanism is encoded as a differentiable function in the transformer architecture to model the complex weather dynamics. Further, we define a customised physics informed loss for STC-ViT which penalize the model's predictions for deviating away from physical laws. We evaluate STC-ViT against operational Numerical Weather Prediction (NWP) model and several deep learning based weather forecasting models. STC-ViT, trained on 1.5-degree 6-hourly data, demonstrates computational efficiency and competitive performance compared to state-of-the-art data-driven models trained on higher-resolution data for global forecasting.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the trade - off problem between computational efficiency and accuracy in traditional physics - based numerical weather prediction (NWP) models. Specifically, the authors propose a new method named **STC - ViT** (Spatio - Temporal Continuous Vision Transformer) to improve the continuous spatio - temporal modeling ability of the weather forecasting system. #### Main problems: 1. **Computational efficiency and accuracy**: Traditional physics - based numerical weather forecasting systems are accurate but extremely computationally costly and have cumulative errors, requiring a large amount of computational resources (Palmer et al., 2005; Andersson, 2022). Therefore, there is an urgent need for a method that can ensure accuracy while improving computational efficiency. 2. **Limitations of discrete models**: Existing deep - learning models based on Transformers perform well in weather forecasting, but they are essentially discrete and ignore the basic physical laws of the atmosphere, which limits their ability to learn continuous spatio - temporal features (Fonseca et al., 2023). 3. **Modeling of continuous spatio - temporal dynamics**: Weather data has significant spatio - temporal continuity and dynamic evolution characteristics, which pose challenges for generating accurate forecasts. Existing discrete models have difficulty capturing these complex spatio - temporal changes. #### Solutions: To solve the above problems, the authors propose **STC - ViT**, and its main innovations include: - **Continuous spatio - temporal attention mechanism**: By introducing the continuous - time neural ordinary differential equation (Neural ODE) layer and the multi - head attention mechanism, STC - ViT can learn the continuous evolution process of the weather system, thereby better capturing spatio - temporal continuity. - **Physics - constrained loss function**: To ensure that the model predictions conform to the atmospheric physical laws, the authors design a customized physics - informed loss function, which constrains the model predictions through soft penalty terms to make them closer to the real physical behavior. - **Pre - processing step**: By calculating the time derivatives of weather variables as a pre - processing step, the effect of feature extraction is further enhanced. Through these improvements, STC - ViT not only improves computational efficiency but also shows performance comparable to existing state - of - the - art data - driven models in global weather forecasting, especially when trained on lower - resolution data. ### Summary The main goal of this paper is to develop a weather forecasting system that can operate efficiently and predict accurately. By combining continuous spatio - temporal modeling and physical constraints, STC - ViT improves computational efficiency while ensuring the physical consistency of the prediction results.

STC-ViT: Spatio Temporal Continuous Vision Transformer for Weather Forecasting

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Spatial-Temporal Convolutional Transformer Network for Multivariate Time Series Forecasting

Spatio-Temporal Transformer Network for Weather Forecasting

Stecformer: Spatio-temporal Encoding Cascaded Transformer for Multivariate Long-term Time Series Forecasting

STVformer: A Spatial-Temporal-variable Transformer with Auxiliary Knowledge for Sea Surface Temperature Prediction

TENT: Tensorized Encoder Transformer for Temperature Forecasting

CViT: Continuous Vision Transformer for Operator Learning

HEAL-ViT: Vision Transformers on a spherical mesh for medium-range weather forecasting

WeatherFormer: Empowering Global Numerical Weather Forecasting with Space-Time Transformer

Multi-resolution Time-Series Transformer for Long-term Forecasting

Continuous Spatiotemporal Transformers

sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting

NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting

ST2T: A Spatio-Temporal Transformer for Cellular Traffic Prediction in Digital Twin Systems

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Numerical Weather Forecasting using Convolutional-LSTM with Attention and Context Matcher Mechanisms

A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting

Itransformer: Inverted Transformers Are Effective for Time Series Forecasting

Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers