Facial Highlight Removal with Cross-Context Attention and Texture Enhancement

Hongsheng Zheng,Wenju Xu,Zhenyu Wang,Xiao Lu,Chunxia Xiao
DOI: https://doi.org/10.1109/tcsvt.2024.3471875
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Facial highlight removal aims to identify and remove the specular highlight components in the facial image, ensuring that the generated image has a consistent facial tone and high-fidelity texture detail. Existing methods struggle to remove the highlight and recover the details in disturbed areas simultaneously, often resulting in specular residues or distorted local details ( i.e . texture, illumination, and color). To rectify these issues, this work proposes a novel two-stage facial highlight removal network (FHR-Net), which mainly consists of a Cross-Context Attention Module (CCAM) and a Texture Enhancement Module (TEM). In the first stage, according to the detected highlight mask, the CCAM explicitly integrates cross-context information to obtain coarse highlight removal results consistent with the surrounding facial context. Building upon the coarse result, the TEM in the second stage utilizes patch-wise attention to refine the texture details in the highlight areas, thereby producing a high-fidelity facial image. To improve coherence between the removed highlight areas and non-highlight areas, this work introduces a face feature loss that makes the processed highlight-disturbed areas align well with the surrounding facial architecture. Additionally, to address the lack of high-quality datasets in the research community and satisfy the training demands for data-driven facial highlight removal, this work builds a real-world Paired Facial Specular-Diffuse (PFSD) dataset through cross-polarization. Experimental results on PFSD and other datasets demonstrate that FHR-Net can effectively remove the facial highlight and recover original color and texture details.
What problem does this paper attempt to address?