Abstract:Video frame prediction is both challenging and critical for computer vision. Though the research on predicting video frames has gradually shifted from pixel-law based methods to motion based ones, existing predictors often generate ambiguous future frames, especially for long-term predictions. This paper proposes a composed model to generate future frames with more details. First, to further exploit motion information, we design a single motion decoder to strengthen the efficiency of the motion encoder in the original motion-content network (MCnet). Second, to alleviate prediction ambiguousness, we use both edges with and without semantic meanings from the holistically-nested edge detection (HED) module as content details. Third, based on the conclusion that the mean squared error (MSE) loss and the traditional generative adversarial learning framework cause the unsatisfied predictions of MCnet, we design a composite loss function that can guide our model to simultaneously focus on motions and content details. Also, based on the abovementioned conclusion, we finally embed our model in an improved generative adversarial network, which further enhances its performance. Experimental results on the benchmark KTH and UCF101 datasets show that our model outperforms the state-of-the-art predictors, such as the basic MCnet, the predictive neural network (PredNet), and the PredNet with a reduced-gate convolutional network (rgc-PredNet), in terms of peak signal to noise ratio (PSNR) and structural similarity index measure (SSIM), especially for long-term video frame prediction.

Analyzing and Improving the Pyramidal Predictive Network for Future Video Frame Prediction

Pyramidal Predictive Network: A Model for Visual-Frame Prediction Based on Predictive Coding Theory

Anti-aliasing Predictive Coding Network for Future Video Frame Prediction

Looking-Ahead: Neural Future Video Frame Prediction

Adaptive Recurrent Frame Prediction with Learnable Motion Vectors.

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

RPP-Net: Rigid Constrained Point Cloud Prediction Network

Video prediction: a step-by-step improvement of a video synthesis network

Video Frame Prediction by Deep Multi-Branch Mask Network

A Little Improvement on Future Frame Prediction for Anomaly Detection Baseline

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Towards Lightweight Neural Network-based Chroma Intra Prediction for Video Coding

An End-to-End Future Frame Prediction Method for Vehicle-Centric Driving Videos

Probabilistic Video Prediction From Noisy Data With a Posterior Confidence.

A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation

One-Step Time-Dependent Future Video Frame Prediction with a Convolutional Encoder-Decoder Neural Network

PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning

Folded Recurrent Neural Networks for Future Video Prediction

Fractal Pyramid Networks

Video Frame Prediction with Dual-Stream Deep Network Emphasizing Motions and Content Details.