Abstract:Pose-guided person image generation that aims to transfer the pose of a given person to a target pose has recently received lots of research attention. Due to the spatial misalignment and occlusions of different local body parts by pose variations, this task is still challenging especially in maintaining high-fidelity textures and body structures in generated images. Besides, most works also suffer from the limited number of texture styles in the given person datasets, restricting the diversity of generated persons' appearances. To solve these problems, we design a Kernel-based Texture-Fusion Joint Refinement Network (TFJR-Net) to jointly refine the structure and texture information of generated images. First, we leverage a bone-map representation to guide the generation of human parsing maps, which has more structure priors and richer context information than traditional key-point maps, thus reduce the uncertainty of generated body structures. Next, a Texture-Kernel Injection Normalization module (TKIN) is proposed to inject the per-region texture-kernel into the corresponding semantic region from the human parsing map, which decouples the texture and shape information, and also preserves fine-grained features for complex textures. Furthermore, we are the first to introduce external texture patterns outside of the dataset in human semantic regions such as the upper clothes. We fuse the two texture domains in a shared texture space through our designed texture-fusion TKIN modules. Extensive experiments are conducted on the Deepfashion dataset, with the DTD dataset as an external texture source. The experimental results demonstrate the superiority of our proposed method in generating persons of better textures and structures than state-of-the-art works, and also show the generalization ability of our proposed method to absorb diversified external textures for generating person images. The source codes are available at https://github.com/pilgrim00/TKIN.

PCFN: Progressive Cross-Modal Fusion Network for Human Pose Transfer

FreqHPT: Frequency-aware Attention and Flow Fusion for Human Pose Transfer.

Adaptively Fusing Complete Multi-resolution Features for Human Pose Estimation.

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

Progressive and Aligned Pose Attention Transfer for Person Image Generation

A 3D Mesh-based Lifting-and-Projection Network for Human Pose Transfer

Complementary Feature Pyramid Network for Human Pose Estimation

Towards Fine-Grained Human Pose Transfer With Detail Replenishing Network

PoNA: Pose-Guided Non-Local Attention for Human Pose Transfer

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

CPD-GAN: Cascaded Pyramid Deformation GAN for Pose Transfer

Human Pose Transfer by Adaptive Hierarchical Deformation

Attention-Guided GANs for Human Pose Transfer

Exploiting appearance transfer and multi-scale context for efficient person image generation

Pose-Guided High-Resolution Appearance Transfer via Progressive Training

FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions

Hierarchical Generation Of Human Pose With Part-Based Layer Representation

Exploring Kernel-based Texture Transfer for Pose-guided Person Image Generation

SCRN: Stepwise Change and Refine Network Based Semantic Distribution for Human Pose Transfer

Graph-Based Progressive Fusion Network for Multi-Modality Vehicle Re-Identification

Attentional pixel-wise deformation for pose-based human image generation