A lightweight multi-granularity asymmetric motion mode video frame prediction algorithm

Jie Yan,Guihe Qin,Minghui Sun,Yanhua Liang,Zhonghan Zhang,Yinghui Xu
DOI: https://doi.org/10.1007/s00371-024-03298-2
IF: 2.835
2024-03-18
The Visual Computer
Abstract:Due to the increasing demand for improved accuracy in video prediction tasks, there has been a noticeable trend in deepening network layers and adopting more intricate architectures. Although these approaches can indeed enhance the performance of models, they come at the expense of longer training times and heightened hardware requirements. To tackle this challenge, this study introduces a novel lightweight neural network architecture based on ConvLSTM. The proposed architecture consists of three fundamental components: an asymmetric convolutional kernel (ACK), a fine-grained feature extractor (FFE), and a coarse-grained feature fuser (CFF). The ACK is purposefully designed to specifically enhance motion modeling capabilities. This is achieved by independently establishing motion models in each direction, enabling the expansion of individual motion patterns. Moreover, the proposed framework incorporates the FFE and CFF modules, which effectively extract and integrate spatial texture features at intra- and inter-levels, employing a hierarchical localization technique. This approach enables efficient and lightweight video frame prediction. We conducted extensive performance evaluations of the proposed model on multiple datasets. Remarkably, even with a reduction of 50% in the number of parameters compared to the baseline model, our approach still achieved competitive results when compared with other existing methods.
computer science, software engineering
What problem does this paper attempt to address?