Abstract:Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given image and condition. In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. Compared to previous direct-synthesis-based works, our proposed LFDM can better synthesize spatial details and temporal motion by fully utilizing the spatial content of the given image and warping it in the latent space according to the generated temporally-coherent flow. The training of LFDM consists of two separate stages: (1) an unsupervised learning stage to train a latent flow auto-encoder for spatial content generation, including a flow predictor to estimate latent flow between pairs of video frames, and (2) a conditional learning stage to train a 3D-UNet-based diffusion model (DM) for temporal latent flow generation. Unlike previous DMs operating in pixel space or latent feature space that couples spatial and temporal information, the DM in our LFDM only needs to learn a low-dimensional latent flow space for motion generation, thus being more computationally efficient. We conduct comprehensive experiments on multiple datasets, where LFDM consistently outperforms prior arts. Furthermore, we show that LFDM can be easily adapted to new domains by simply finetuning the image decoder. Our code is available at <a class="link-external link-https" href="https://github.com/nihaomiao/CVPR23_LFDM" rel="external noopener nofollow">this https URL</a>.

Conditional Inpainting Generative Flow.

Diverse Image Inpainting with Normalizing Flow.

SelFSR: Self-Conditioned Face Super-Resolution in the Wild via Flow Field Degradation Network

Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation

Generate Optical Flow with Conditional Generative Adversarial Network

C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

Progressive Inpainting Strategy with Partial Convolutions Generative Networks (PPCGN).

A Progressive Image Inpainting Algorithm with a Mask Auto-update Branch

Image Inpainting Based on Interactive Separation Network and Progressive Reconstruction Algorithm

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

Flow Matching in Latent Space

Incremental Focal Loss GANs.

Free-Form Image Inpainting with Gated Convolution

Flow-Guided Diffusion for Video Inpainting

Towards An End-to-End Framework for Flow-Guided Video Inpainting

Semantically Consistent Video Inpainting with Conditional Diffusion Models

Indexicality, intensionality, and relativist post-semantics

Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows

StructureFlow: Image Inpainting Via Structure-aware Appearance Flow

AgeFlow: Conditional Age Progression and Regression with Normalizing Flows

RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior