Abstract:We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive interaction between the current input and the historical states of LSTM compared with the traditional Encoder-LSTM-Decoder architecture, thus learning more believable temporal and spatial dependencies. Furthermore, to tackle the instability in adversarial training and mitigate the accumulation of prediction errors in long-term prediction, we propose several improvements to the training strategy. Our approach achieves good performance on datasets such as KTH, Moving MNIST and Caltech Pedestrian. Code is available at <a class="link-external link-https" href="https://github.com/Ling-CF/MSPN" rel="external noopener nofollow">this https URL</a>.

Motion-Aware Feature Enhancement Network for Video Prediction

Adaptive Hierarchical Motion-Focused Model for Video Prediction.

Feature Based Inter Prediction Optimization for Non-Translational Video Coding in Cloud

Human Visual Perception Based Image Quality Assessment for Video Prediction

Forecasting Distillation: Enhancing 3D Human Motion Prediction with Guidance Regularization

Video Frame Prediction by Deep Multi-Branch Mask Network

MAU: A Motion-Aware Unit for Video Prediction and Beyond

A lightweight multi-granularity asymmetric motion mode video frame prediction algorithm

Motion and Context-Aware Audio-Visual Conditioned Video Prediction

Integrated Multiscale Appearance Features and Motion Information Prediction Network for Anomaly Detection

From Single to Multiple: Leveraging Multi-level Prediction Spaces for Video Forecasting

MMVP: Motion-Matrix-based Video Prediction

Adaptive Recurrent Frame Prediction with Learnable Motion Vectors.

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

Motion-Aware Feature for Improved Video Anomaly Detection

Pair-wise Layer Attention with Spatial Masking for Video Prediction

Motion Graph Unleashed: A Novel Approach to Video Prediction

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

Video prediction: a step-by-step improvement of a video synthesis network

HumanMAC: Masked Motion Completion for Human Motion Prediction

Multi-level Motion Attention for Human Motion Prediction