Hybrid Transformers with Attention-guided Spatial Embeddings for Makeup Transfer and Removal

Mingxiu Li,Wei Yu,Qinglin Liu,Zonglin Li,Ru Li,Bineng Zhong,Shengping Zhang
DOI: https://doi.org/10.1109/tcsvt.2023.3312790
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Existing makeup transfer methods typically transfer simple makeup colors in a well-conditioned face image and fail to handle makeup style details (e.g., complicated colors and shapes) and facial occlusion. To address these problems, this paper proposes Hybrid Transformers with Attention-guided Spatial Embeddings (named HT-ASE) for makeup transfer and removal. Specifically, a makeup context extractor adopts makeup context global-local interactions to aggregate the high-level context and low-level detail features of the makeup styles, which obtains the context-aware makeup features that encode the complicated colors and shapes of the makeup styles. A face identity extractor adopts a face identity local interaction to aggregate the identity-relevant features of shallow layers into identity semantic features, which refines the identity features. A spatially similarity-aware fusion network introduces a spatially-adaptive layer-instance normalization with attention-guided spatial embeddings to perform semantic alignment and fusion between the makeup and identity features, yielding precise and robust transfer results even with large spatial misalignment and facial occlusion. Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art methods, especially in the preservation of makeup style details and handling facial occlusion.
engineering, electrical & electronic
What problem does this paper attempt to address?