Exploring Spatial Frequency Information for Enhanced Video Prediction Quality

Junyu Lai,Lianqiang Gan,Junhong Zhu,Huashuo Liu,Lianli Gao
DOI: https://doi.org/10.1109/tmm.2024.3384062
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Video prediction is a challenging spatiotemporal prediction task that generates future frames based on historical observations. Although recently proposed deep learning-based methods significantly outperform legacy approaches, there still exist gaps between prediction and ground truth, primarily rooted in edge and motion blurring. On the one hand, since conventional performance metrics like Mean Square Error (MSE) and Structure Similarity Index Measure (SSIM) cannot decently evaluate this deficiency, we design a 3D Frequency Loss (3DFL) metric to better assess the similarity of predicted video frames. On the other hand, edge and motion blurring is mainly attributed to the predictive model's insufficient attention to high spatial frequency arising from rapid pixel value variations at object edges, and it is observed that shallow networks are more adept at capturing high spatial frequency information. Therefore, aiming to alleviate edge and motion blurring, we propose a novel video prediction model termed SDFNet that can extract and integrate both spatially encoded shallow and deep-level features. To accommodate SDFNet's multi-branch input structure, a frequency adaptive translator (FATranslator) is derived, which leverages involution operators to adaptively extract inter-frame temporal dependencies from different spatial encoding layers, and further mitigates motion blurring. Extensive experiments demonstrate that our proposed model achieves significant improvements in prediction accuracy and temporal consistency over the current state-of-the-art models on various benchmarks. The results highlight the importance of spatial frequency modeling for enhancing video prediction performance, contributing to the advancement of multimedia technologies.
What problem does this paper attempt to address?