Transformer-Based adversarial network for semi-supervised face sketch synthesis

Zhihua Shi,Weiguo Wan
DOI: https://doi.org/10.1016/j.jvcir.2024.104204
IF: 2.887
2024-06-14
Journal of Visual Communication and Image Representation
Abstract:Face sketch synthesis is a technique utilized to convert real face images into artistic sketches, which holds vast potential within criminal investigation and entertainment. The existing methods usually train the generation models on the paired face photo-sketch datasets, which are challenging to acquire. Moreover, their results usually produce blurring, artifacts, and structural distortion, leading to inferior visual effects. To solve the above issues, we propose a semi-supervised Transformer-based adversarial network for face sketch synthesis, which can be trained on unpaired datasets. In the network, the Transformer encoder structure is modified with the adaptive window attention (AWA) to better extract local and global facial features while minimizing computational complexity. A Transformer-based feature fusion module is used to fuse the extracted features. In addition, a detail extractor module is designed by Laplacian operators to effectively preserve the detail information of the face photo images to the face sketch images. In the detail extractor module, we introduce a mask operation to remove the textures that do not exist in the original face photo images. Experimental results on the CUHK, AR, XM2VTS, and CUFSF datasets showcase the excellent subjective and objective performance of the proposed face sketch synthesis method compared to current state-of-the-art methods.
computer science, information systems, software engineering
What problem does this paper attempt to address?