Enhancing spatiotemporal predictive learning: an approach with nested attention module

Shaoping Wang,Ren Han
DOI: https://doi.org/10.1007/s10845-023-02318-7
IF: 8.3
2024-02-21
Journal of Intelligent Manufacturing
Abstract:Spatiotemporal predictive learning is a deep learning method that generates future frames from historical frames in a self-supervised manner. Existing studies face the challenges in capturing long-term dependencies and producing accurate predictions over extended time horizons. To address these limitations, this paper introduces a nested attention module as a special attention mechanism to capture spatiotemporal correlations of input historical frames. Nested attention module decomposes temporal attention into inter-frame channel attention and spatiotemporal attention and uses a nested attention mechanism to capture long-term temporal dependencies, which improves the model's performance and generalization ability. Furthermore, to prevent overfitting in models, a new regularization method is proposed which considers both the intra-frame spatial error and the inter-frame temporal evolution error of sequence frames, and enhances the robustness of the reinforcement learning model to dropout operations. The proposed model achieves state-of-the-art performance on four baseline datasets, including moving MNIST handwritten digit dataset, human 3.6 million dataset, sea surface temperature dataset, and karlsruhe institute of technology and Toyota technological institute dataset. Extended experiments demonstrate the generalization and extensibility of nested attention module on real-world datasets. A dramatic 31.7% mean squared error/26.9% mean absolute error reduction is achieved when predicting 10 frames on moving MNIST. Our proposed model provides a new baseline for future research in spatiotemporal predictive learning tasks.
engineering, manufacturing,computer science, artificial intelligence
What problem does this paper attempt to address?