Stochastic Video Generation With Disentangled Representations

Maomao Li,Chun Yuan,Zhihui Lin,Zhuobin Zheng,Yangyang Cheng
DOI: https://doi.org/10.1109/ICME.2019.00047
2019-01-01
Abstract:Frame-to-frame uncertainty is a major challenge in video prediction. The use of the deterministic models always leads to averaging of future states. Some methods draw samples from a prior at each time step to deal with the uncertainty of the future states, such as the SVG model [1]. However, these models always use only one set of latent variables to represent the whole stochastic part in a video clip whereas sequential data often involves multiple independent factors. In this paper, we exploit the complex representation of information in video sequences by formulating it explicitly with a disentangled-representation stochastic video generation (DR-SVG) model that imposes sequence-dependent prior and sequence-independent prior to different sets of latent variables. Through a variational lower-bound and adversarial objective functions in latent space, our model can produce crisper frames with clear content and pose which indicate the sequence-dependent and sequence-independent component respectively.
What problem does this paper attempt to address?