Abstract:This paper introduces a novel approach to unsupervised image-to-image translation, aiming to overcome the limitations of existing methods in accurately capturing the shape of the source domain and the style of the target domain. The proposed method, called Semantic Cooperative Shape Perception (SCSP), focuses on enhancing the quality of generated images by addressing two key aspects. Firstly, the SCSP model employs a fusion generator that divides the mapping process into a unique texture part and a shared semantic part. By using different network structures and constraints, each part learns specific information. The unique texture generator emphasizes the style and texture details of the target domain, while the shared semantic generator focuses on the semantic information present in the source domain. This separation enables the sub-generators to extract and restore different aspects of the target domain more effectively. Secondly, a shape perception loss is introduced to improve the similarity of semantic images. It enhances the shared semantic generator's ability to perceive semantic information related to the same object by imposing constraints on the semantic graph of both the generated and input images. Therefore, the proposed method ensures semantic consistency during the translation process, leading to improved authenticity and image quality. Experimental results on four datasets, including horse2zebra, tiger2leopard, summer2winter, and photo2vangogh, demonstrate that the SCSP model achieves state-of-the-art visualization results and favorable evaluation metrics.

SCSP: an Unsupervised Image-to-Image Translation Network Based on Semantic Cooperative Shape Perception