Abstract:Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross‐domain alignment on high‐level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi‐scale alignment, in both low‐level and high‐level domains, for ensuring reliable cross‐domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi‐scale Cross‐domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi‐scale interaction between pose and appearance inputs, which employs pair‐wise window‐based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.

Multi-Scale Correspondence Learning for Person Image Generation.

Multi‐scale Cross‐domain Alignment for Person Image Generation

GLocal: Global Graph Reasoning and Local Structure Transfer for Person Image Generation

Correspondence Learning for Controllable Person Image Generation

Precise Correspondence Enhanced GAN for Person Image Generation

Exploiting appearance transfer and multi-scale context for efficient person image generation

Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation

Multi-scale Matching Networks for Semantic Correspondence

Mutually Activated Residual Linear Modeling GAN for Pose-Guided Person Image Generation

MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation

Unsupervised Person Image Generation with Semantic Parsing Transformation

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation.

Person Search by Multi-Scale Matching

LSG-GAN: Latent space guided generative adversarial network for person pose transfer

Multi Positive Contrastive Learning with Pose-Consistent Generated Images

Hierarchical Generation Of Human Pose With Part-Based Layer Representation

Title Multiscale Generative Model of Human Faces Permalink

Multi-Scale Structure-Aware Network for Human Pose Estimation

Unsupervised Learning of Depth Estimation and Camera Pose With Multi-Scale GANs

Progressive and Aligned Pose Attention Transfer for Person Image Generation

SCRN: Stepwise Change and Refine Network Based Semantic Distribution for Human Pose Transfer