Abstract:This paper presents a novel deep-learning framework that significantly enhances the transformation of rudimentary face sketches into high-fidelity colour images. Employing a Convolutional Block Attention-based Auto-encoder Network (CA2N), our approach effectively captures and enhances critical facial features through a block attention mechanism within an encoder-decoder architecture. Subsequently, the framework utilises a noise-induced conditional Generative Adversarial Network (cGAN) process that allows the system to maintain high performance even on domains unseen during the training. These enhancements lead to considerable improvements in image realism and fidelity, with our model achieving superior performance metrics that outperform the best method by FID margin of 17, 23, and 38 on CelebAMask-HQ, CUHK, and CUFSF datasets; respectively. The model sets a new state-of-the-art in sketch-to-image generation, can generalize across sketch types, and offers a robust solution for applications such as criminal identification in law enforcement.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges encountered in the process of converting rough hand - drawn facial sketches into high - fidelity color images. Specifically, the author aims to significantly improve the quality and accuracy of this conversion process by introducing a novel deep - learning framework. The following are the main problems and solutions in this research: ### 1. **Insufficient details in rough sketches** - **Problem**: Facial sketches drawn by witnesses or non - professional artists are usually simple and defective, lacking key details, and the positions of brushstrokes may be inconsistent with the original image, resulting in sparse features. - **Solution**: By introducing the Convolutional Block Attention - based Auto - encoder Network (CA2N), this model can independently identify and process five different facial feature descriptors (left eye, right eye, nose, mouth, and other facial areas), thereby improving the quality and accuracy of the initial sketch. ### 2. **Realism and fidelity of image generation** - **Problem**: Existing methods may have problems such as information loss and image blurring when generating images, especially in low - resolution image generation. - **Solution**: Use the Noise - Induced Conditional Generative Adversarial Network (cGAN) for iterative refinement, combined with global and local loss functions (such as the Structural Similarity Index and L1 loss), to significantly improve the realism and fidelity of the generated images. ### 3. **Generalization ability of the model** - **Problem**: Existing methods perform poorly when dealing with unseen domains (such as different types of sketches). - **Solution**: By introducing a noise - induced learning strategy, the model can adapt to different types of input data during the training process, thereby enhancing its generalization ability and ensuring good performance on different types of sketches. ### 4. **Requirements of specific application scenarios** - **Problem**: In application scenarios such as law enforcement, it is necessary to accurately generate high - quality facial images from rough sketches to help identify criminal suspects. - **Solution**: This framework not only improves the quality of image generation but also shows robustness and adaptability on different types of sketches (such as hand - drawn, line, and Photoshop sketches), which is suitable for diverse requirements in practical applications. ### Summary This paper proposes a novel deep - learning framework. Through the Convolutional Block Attention - based Auto - encoder Network (CA2N) and the Noise - Induced Conditional Generative Adversarial Network (cGAN), it solves the problems of insufficient details, lack of realism, and poor generalization ability in the process of generating high - fidelity color images from rough facial sketches. This method has achieved significant performance improvements on multiple benchmark datasets and shows potential in practical application scenarios.

Locally-Focused Face Representation for Sketch-to-Image Generation Using Noise-Induced Refinement

Face Sketch Landmarks Localization in the Wild

Attribute-Guided Sketch Generation

RefFaceNet: Reference-based Face Image Generation from Line Art Drawings

Face Sketch Synthesis via Semantic-Driven Generative Adversarial Network

End-to-End Photo-Sketch Generation via Fully Convolutional Representation Learning

Face photo-line drawings synthesis based on local extraction preserving generative adversarial networks

Deep Generation of Face Images from Sketches

Recognizing Facial Sketches by Generating Photorealistic Faces Guided by Descriptive Attributes

Quality Guided Sketch-to-Photo Image Synthesis

[Laboratory diagnosis in heart transplant patients].

Cali-sketch: Stroke calibration and completion for high-quality face image generation from human-like sketches

Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs

Sketch to Image synthesis using attention based contextual GAN

Face photo-drawing conversion based on multi-scale feature-enhanced generative adversarial networks

Advanced 3D Face Reconstruction from Single 2D Images Using Enhanced Adversarial Neural Networks and Graph Neural Networks

Enhancing facial recognition accuracy through multi-scale feature fusion and spatial attention mechanisms

Face image-sketch synthesis via generative adversarial fusion

Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network with Graph Representation Learning

Fine-Granularity Face Sketch Synthesis

Recognizing Minimal Facial Sketch by Generating Photorealistic Faces with the Guidance of Descriptive Attributes