Abstract:Image animation is to animate a still image of the object of interest using poses extracted from another video sequence. Through training on a large-scale video dataset, most existing approaches aim to explore disentangled appearance and pose representations of training frames. Then, the desired output with a specific appearance and pose can be synthesized via recombining learned representations. However, in some real-world applications, test images may lack the corresponding video ground-truth or follow a different distribution than the distribution of the training video frames (i.e., different domains), which largely limit the performance of existing methods. In this paper, we propose domain-independent pose representations that are compatible with and accessible by still images from a different domain. Specifically, we devise a two-stage self-supervised pose adaptation framework for general image animation tasks. A domain-independent pose adaptation generative adversarial network (DIPA-GAN) and a shuffle-patch generative adversarial network (Shuffle-patch GAN) are proposed to penalize the rationality of the synthesized frame's pose and appearance, respectively. Finally, experiments evaluated on various image animation tasks, which include same/cross-domain moving objects, facial expression transfer and human pose retargeting, demonstrate the superiority of the proposed framework over prior literature. Impact Statement—Image animation is a popular technology in video production. Benefiting from the rapid development of artificial intelligence (AI), recent image animation algorithms have been widely used in real-world applications, such as virtual AI news anchor, virtual try-on, and face swapping. However, most existing methods are designed for specific cases. To animate a new portrait, users are asked to collect hundreds of images of the same person and train a new model. The technology proposed in this paper overcomes these training limitations and generalizes image animations. In the challenging cross-domain facial expression transfer task, the user study demonstrated that our technology achieved more than 20% increase in animation success rate. The proposed technology could benefit users in a wide variety of industries including movie production, virtual reality, social media and online retail.

Coherent Image Animation Using Spatial-Temporal Correspondence

OAW-GAN: Occlusion-Aware Warping GAN for Unified Human Video Synthesis

Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency

Thin-Plate Spline Motion Model for Image Animation

Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis

Attention-guided Temporally Coherent Video Object Matting

Self-Supervised Pose Adaptation for Cross-Domain Image Animation.

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Animate124: Animating One Image to 4D Dynamic Scene

Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image

Continuous Piecewise-Affine Based Motion Model for Image Animation

Controllable Longer Image Animation with Diffusion Models

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Temporally Coherent Video Cartoonization for Animation Scenery Generation

Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation

Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Image Animation via Joint Attention Mechanism

Animating Arbitrary Objects via Deep Motion Transfer

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility