Abstract:We proposed a model framework which was based on generative adversarial network for video conversion. Our goal is that two different target videos can synchronize the movements (such as the head displacement and facial movements of the person), and the movements was not existed in the original video. Our key observation is that a video prediction model is added to the original framework of the generative adversarial network, so that the generated video can get the time sequence characteristics of the target video to improve the action consistency and time synchronization stability. In the training process, we obtained and aligned the spatial position of the action in video through landmark points detection, to ensure that the generated samples would not appear the phenomenon of spatial dislocation. In the training process, we will generate sample (t) and obtain (t+1). sample through pre-trained time predictor, calculating the generate sample loss feedback pre-trained generative model. Using this framework, we can: (1) obtain more convenient to make available training samples and improve the available range of the model; (2) improve the accuracy of target generate video. We proposed a model framework which was inspired by generative adversarial network for video conversion. Our goal is that two different target videos can synchronize the movements (such as the head displacement and facial movements of the person), and the movements were not existed in the original video. Our key observation is that a video prediction model is added to the original framework of the generative adversarial network, so that the generated video can get the time sequence characteristics of the target video to improve the action consistency and time synchronization stability. In the training process, we obtained and aligned the spatial position of the action in video through landmark points detection, to ensure that the generated samples would not appear the phenomenon of spatial dislocation. In the training process, we will generate sample (t) and obtain sample (t+1). through pre-trained time predictor, calculating the generate sample loss feedback pre-trained generative model. Using this framework, we can: (1) obtain more convenient to make available training samples and improve the available range of the model; (2) improve the accuracy of target generates video.

A generative-predictive framework used for video conversion

OAW-GAN: Occlusion-Aware Warping GAN for Unified Human Video Synthesis

Transframer: Arbitrary Frame Prediction with Generative Models

Video prediction: a step-by-step improvement of a video synthesis network

VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet

Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image

Predicting Diverse Future Frames with Local Transformation-Guided Masking.

Video Content Swapping Using GAN

Multi-Frame Content Integration with a Spatio-Temporal Attention Mechanism for Person Video Motion Transfer

VEnhancer: Generative Space-Time Enhancement for Video Generation

DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation

Video-to-Video Translation with Global Temporal Consistency.

HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

GenDeF: Learning Generative Deformation Field for Video Generation

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

FrameBridge: Improving Image-to-Video Generation with Bridge Models

To Create What You Tell: Generating Videos from Captions

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Transforming Static Images Using Generative Models for Video Salient Object Detection