Abstract:In this paper, we propose Occlusion-Aware Warping GAN (OAW-GAN), a unified Human Video Synthesis (HVS) framework that can uniformly tackle human video motion transfer, attribute editing, as well as inpainting. This is the first work to our knowledge that can handle all these tasks within a one-time trained model. Although existing GAN-based HVS methods have achieved great success, they either can’t preserve appearance details due to the loss of spatial consistency between the synthesized target frames and the input source images, or generate incoherent video results due to the loss of temporal consistency among frames. Besides, most of them lack the ability to create new contents while keeping existing ones, failing especially when some regions in the target are invisible in the source due to self-occlusion. To address these limitations, we first introduce Coarse-to-Fine Flow Warping Network (C2F-FWN) to estimate spatial-temporal consistent transformation between source and target, as well as occlusion mask indicating which parts in the target are invisible in the source. Then, the flow and the mask are scaled and fed into the pyramidal stages of our OAW-GAN, guiding Occlusion-Aware Synthesis (OAS) that can be abstracted into visible part re-utilization and invisible part inpainting at the feature level, which effectively alleviates the self-occlusion problem. Extensive experiments conducted on both human video (i.e., iPER, SoloDance)Keywords are desired. please provide if necessary. and image (i.e., DeepFashion) datasets demonstrate the superiority of our approach to existing state-of-the-arts. We also show that, besides motion transfer task that previous works concern, our framework can further achieve attribute editing and texture inpainting, which paves the way towards unified HVS.

Ivs-Net: Learning Human View Synthesis from Internet Videos

Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans

Implicit Neural Representations With Structured Latent Codes for Human Body Modeling

Novel View Synthesis of Humans using Differentiable Rendering

Novel View Synthesis of Dynamic Human with Sparse Cameras.

Novel View Synthesis of Human Interactions from Sparse Multi-view Videos

Novel View Synthesis from only a 6-DoF Camera Pose by Two-stage Networks

ReN Human: Learning Relightable Neural Implicit Surfaces for Animatable Human Rendering

OAW-GAN: Occlusion-Aware Warping GAN for Unified Human Video Synthesis

Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild

Human Pose Manipulation and Novel View Synthesis using Differentiable Rendering

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

Learning Dynamic Textures for Neural Rendering of Human Actors

Intrinsic Temporal Regularization for High-resolution Human Video Synthesis

AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation

Neural Rendering and Reenactment of Human Actor Videos

Neural Capture of Animatable 3D Human from Monocular Video.

Generating 3D-Consistent Videos from Unposed Internet Photos

Neural Novel Actor: Learning a Generalized Animatable Neural Representation for Human Actors.

EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings

Novel View Synthesis from a Single Unposed Image via Unsupervised Learning