Spatio-Temporal Transformer Network with Physical Knowledge Distillation for Weather Forecasting

Jing He,Junzhong Ji,Minglong Lei
DOI: https://doi.org/10.1145/3627673.3679841
2024-01-01
Abstract:Weather forecasting has become a popular research topic recently, which mainly benefits from the development of spatio-temporal neural networks to effectively extract useful patterns from weather data. Generally, the weather changes in the meteorological system are governed by physical principles. However, it is challenging for spatio-temporal methods to capture the physical knowledge of meteorological dynamics. To address this problem, we propose in this paper a spatio-temporal Transformer network with physical knowledge distillation (PKD-STTN) for weather forecasting. First, the teacher network is implemented by a differential equation network that models weather changes by the potential energy in the atmosphere to reveal the physical mechanism of atmospheric movements. Second, the student network uses a spatio-temporal Transformer that concurrently utilizes three attention modules to comprehensively capture the semantic spatial correlation, geographical spatial correlation, and temporal correlation from weather data. Finally, the physical knowledge of the teacher network is transferred to the student network by inserting a distillation position encoding into the Transformer. Notice that the output of the teacher network is distilled to the position encoding rather than the output of the student network, which can largely utilize physical knowledge without influencing the feature extraction process of Transformers. Experiments on benchmark datasets show that the proposed method can effectively utilize physical principles of weather changes and has obvious performance advantages compared with several strong baselines.
What problem does this paper attempt to address?