Locally-Focused Face Representation for Sketch-to-Image Generation Using Noise-Induced Refinement

Muhammad Umer Ramzan,Ali Zia,Abdelwahed Khamis,yman Elgharabawy,Ahmad Liaqat,Usman Ali
2024-11-28
Abstract:This paper presents a novel deep-learning framework that significantly enhances the transformation of rudimentary face sketches into high-fidelity colour images. Employing a Convolutional Block Attention-based Auto-encoder Network (CA2N), our approach effectively captures and enhances critical facial features through a block attention mechanism within an encoder-decoder architecture. Subsequently, the framework utilises a noise-induced conditional Generative Adversarial Network (cGAN) process that allows the system to maintain high performance even on domains unseen during the training. These enhancements lead to considerable improvements in image realism and fidelity, with our model achieving superior performance metrics that outperform the best method by FID margin of 17, 23, and 38 on CelebAMask-HQ, CUHK, and CUFSF datasets; respectively. The model sets a new state-of-the-art in sketch-to-image generation, can generalize across sketch types, and offers a robust solution for applications such as criminal identification in law enforcement.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges encountered in the process of converting rough hand - drawn facial sketches into high - fidelity color images. Specifically, the author aims to significantly improve the quality and accuracy of this conversion process by introducing a novel deep - learning framework. The following are the main problems and solutions in this research: ### 1. **Insufficient details in rough sketches** - **Problem**: Facial sketches drawn by witnesses or non - professional artists are usually simple and defective, lacking key details, and the positions of brushstrokes may be inconsistent with the original image, resulting in sparse features. - **Solution**: By introducing the Convolutional Block Attention - based Auto - encoder Network (CA2N), this model can independently identify and process five different facial feature descriptors (left eye, right eye, nose, mouth, and other facial areas), thereby improving the quality and accuracy of the initial sketch. ### 2. **Realism and fidelity of image generation** - **Problem**: Existing methods may have problems such as information loss and image blurring when generating images, especially in low - resolution image generation. - **Solution**: Use the Noise - Induced Conditional Generative Adversarial Network (cGAN) for iterative refinement, combined with global and local loss functions (such as the Structural Similarity Index and L1 loss), to significantly improve the realism and fidelity of the generated images. ### 3. **Generalization ability of the model** - **Problem**: Existing methods perform poorly when dealing with unseen domains (such as different types of sketches). - **Solution**: By introducing a noise - induced learning strategy, the model can adapt to different types of input data during the training process, thereby enhancing its generalization ability and ensuring good performance on different types of sketches. ### 4. **Requirements of specific application scenarios** - **Problem**: In application scenarios such as law enforcement, it is necessary to accurately generate high - quality facial images from rough sketches to help identify criminal suspects. - **Solution**: This framework not only improves the quality of image generation but also shows robustness and adaptability on different types of sketches (such as hand - drawn, line, and Photoshop sketches), which is suitable for diverse requirements in practical applications. ### Summary This paper proposes a novel deep - learning framework. Through the Convolutional Block Attention - based Auto - encoder Network (CA2N) and the Noise - Induced Conditional Generative Adversarial Network (cGAN), it solves the problems of insufficient details, lack of realism, and poor generalization ability in the process of generating high - fidelity color images from rough facial sketches. This method has achieved significant performance improvements on multiple benchmark datasets and shows potential in practical application scenarios.