Progressive Local-to-global Vision Transformer for Occluded Face Hallucination

Huan Wang,Jianning Chi,Chengdong Wu,Xiaosheng Yu,Hao Wu
DOI: https://doi.org/10.1007/s11042-023-15028-2
IF: 2.577
2023-01-01
Multimedia Tools and Applications
Abstract:Hallucinating a photo-realistic high-resolution (HR) face image from an occluded low-resolution (LR) face image is beneficial for a series of face-related applications. However, previous efforts focused on either super-resolving HR face images from non-occluded LR counterparts or inpainting occluded HR faces. It is necessary to address all these challenges jointly for real-world face images in unconstrained environment. In this paper, we develop a novel Local-to-Global Face Hallucination Transformer (LGFH-Transformer), which simultaneously handles the occluded LR face image super-resolution (SR) and inpainting in a unified framework. Specifically, the LGFH-Transformer is built on self-attention modules which excel at modeling long-range information between image patch sequences. Meanwhile, we introduce a mask-guided convolution and gated mechanism into the building modules (i.e., multi-head attention and feed-forward network) of each Transformer block, which can bring in the complimentary strength of convolution operation to emphasize on the spatially local context. Moreover, equipped with the delicate designed local-to-global feature reasoning mechanism in the phase of encoder, we exploit facial geometry priors (i.e., facial parsing maps) as the semantic guidance during the hallucination process in the phase of decoder to reconstruct more realistic facial details. Extensive experiments demonstrate the effectiveness and advancement of LGFH-Transformer.
What problem does this paper attempt to address?