PCFN: Progressive Cross-Modal Fusion Network for Human Pose Transfer

Wei Yu,Yanping Li,Rui Wang,Wenming Cao,Wei Xiang
DOI: https://doi.org/10.1109/tcsvt.2022.3233060
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The goal of human pose transfer is to transfer the human in the image from the original pose to the desired one. Existing methods utilizing progressive manner have achieved great success. However, they fail to remove background distraction and preserve appearance details in synthesized images since the correlation between the image and pose is not fully exploit. To this end, we propose a novel progressive cross-modal fusion network (PCFN), which consists of multiple cascaded cross-modal fusion blocks (CMFBs). Each CMFB comprises a feature fusion module (FFM) and a cross-modal module (CMM) to take full advantage of appearance and shape information. From an overall perspective, FFM fully exploits the correlation between image features and pose features through the residual gated convolution. Benefitting from feature integration and dynamic selection, CMFB can extract useful information from the image-pose stream. From a local perspective, CMM utilizes the feature-conditioned gated convolution and the pose-guided heterogeneous attention mechanism to update all codes in a crossing manner and enhance the interaction between fusion information and structural information. Qualitative and quantitative experiments demonstrate the superiority of PCFN compared to state-of-the-art methods, which can transfer the correct human features and increase the authenticity of the generated images. At the same time, PCFN can also be applied to supplement the dataset for person re-identification (ReID). PCFN works well for human pose transfer, and our usage of the gated convolution and the attention mechanism also provides references for other conditional generation tasks.
engineering, electrical & electronic
What problem does this paper attempt to address?