Unpaired Person Image Generation With Semantic Parsing Transformation
Sijie Song,Wei Zhang,Jiaying Liu,Zongming Guo,Tao Mei
DOI: https://doi.org/10.1109/TPAMI.2020.2992105
Abstract:In this paper, we tackle the problem of pose-guided person image generation with unpaired data, which is a challenging problem due to non-rigid spatial deformation. Instead of learning a fixed mapping directly between human bodies as previous methods, we propose a new pathway to decompose a single fixed mapping into two subtasks, namely, semantic parsing transformation and appearance generation. First, to simplify the learning for non-rigid deformation, a semantic generative network is developed to transform semantic parsing maps between different poses. Second, guided by semantic parsing maps, we render the foreground and background image, respectively. A foreground generative network learns to synthesize semantic-aware textures, and another background generative network learns to predict missing background regions caused by pose changes. Third, we enable pseudo-label training with unpaired data, and demonstrate that end-to-end training of the overall network further refines the semantic map prediction and final results accordingly. Moreover, our method is generalizable to other person image generation tasks defined on semantic maps, e.g., clothing texture transfer, controlled image manipulation, and virtual try-on. Experimental results on DeepFashion and Market-1501 datasets demonstrate the superiority of our method, especially in keeping better body shapes and clothing attributes, as well as rendering structure-coherent backgrounds.