Exploring and Exploiting High-Order Spatial-Temporal Dynamics for Long-Term Frame Prediction
Kuai Dai,Xutao Li,Yunming Ye,Yaowei Wang,Shanshan Feng,Di Xian
DOI: https://doi.org/10.1109/tcsvt.2023.3298978
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Long-term spatial-temporal frame prediction focuses on predicting future image frames precisely, which has numerous applications in real-world scenarios. Existing deep learning prediction models mainly rely on advanced neural network architectures to model complicated spatial-temporal features, which make few efforts to explore high-order correlations to better capture long-term dynamics. Their prediction on long-term frames suffers from inaccurate visual and motion detail issue. In this article, we propose a high-order prediction model for long-term frame prediction, which improves the appearance and motion details by designing special high-order correlation modules in two aspects. First, to enhance the appearance details of predicted frames, we propose a high-order appearance encoder module, where high-order appearance features can be effectively captured with a carefully designed Non-local ConvLSTM. Second, to guarantee the motion accuracy of predicted sequences, we carefully design a high-order motion encoder module, which can accurately capture and preserve the high-order motion patterns with adaptive motion extractors and progressive memory banks, respectively. Comprehensive experiments are conducted on six challenging datasets from real-world scenarios, which demonstrate the effectiveness and superiority of our proposed method over state-of-the-art methods.