Flexible Spatio-Temporal Networks for Video Prediction

Chaochao Lu,Michael Hirsch,Bernhard Schölkopf,Bernhard Scholkopf
DOI: https://doi.org/10.1109/cvpr.2017.230
2017-07-01
Abstract:We describe a modular framework for video frame prediction. We refer to it as a Flexible Spatio-Temporal Network (FSTN) as it allows the extrapolation of a video sequence as well as the estimation of synthetic frames lying in between observed frames and thus the generation of slow-motion videos. By devising a customized objective function comprising decoding, encoding, and adversarial losses, we are able to mitigate the common problem of blurry predictions, managing to retain high frequency information even for relatively distant future predictions. We propose and analyse different training strategies to optimize our model. Extensive experiments on several challenging public datasets demonstrate both the versatility and validity of our model.
What problem does this paper attempt to address?