Abstract:We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive interaction between the current input and the historical states of LSTM compared with the traditional Encoder-LSTM-Decoder architecture, thus learning more believable temporal and spatial dependencies. Furthermore, to tackle the instability in adversarial training and mitigate the accumulation of prediction errors in long-term prediction, we propose several improvements to the training strategy. Our approach achieves good performance on datasets such as KTH, Moving MNIST and Caltech Pedestrian. Code is available at <a class="link-external link-https" href="https://github.com/Ling-CF/MSPN" rel="external noopener nofollow">this https URL</a>.

Z-Order Recurrent Neural Networks For Video Prediction

Frame Prediction Using Recurrent Convolutional Encoder with Residual Learning

PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning

Adaptive Hierarchical Motion-Focused Model for Video Prediction.

PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning

Adaptive Recurrent Frame Prediction with Learnable Motion Vectors.

+ YY ’ Z X-Residual Prediction Error TargetPredictionObservation Latent 0 State φ f 1 f 2

MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions

Folded Recurrent Neural Networks for Future Video Prediction

Cubic LSTMs for Video Prediction

State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend

Probabilistic Video Prediction From Noisy Data With a Posterior Confidence.

Video Frame Prediction by Deep Multi-Branch Mask Network

PredCNN: Predictive Learning with Cascade Convolutions

Iprnn - an Information-Preserving Model for Video Prediction Using Spatiotemporal Grus.

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

Video prediction: a step-by-step improvement of a video synthesis network

PSRUNet: a recurrent neural network for spatiotemporal sequence forecasting based on parallel simple recurrent unit

ASTM - an Attention Based Spatiotemporal Model for Video Prediction Using 3D Convolutional Neural Networks.

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction