Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive Learning

Huiyuan Fu,Jin Liu,Ting Yu,Xin Wang,Huadong Ma
DOI: https://doi.org/10.1145/3656048
2024-04-04
Abstract:The objective of multi-domain image-to-image translation is to learn the mapping from a source domain to a target domain in multiple image domains while preserving the content representation of the source domain. Despite the importance and recent efforts, most previous studies disregard the large style discrepancy between images and instances in various domains, or fail to capture instance details and boundaries properly, resulting in poor translation results for rich scenes. To address these problems, we present an effective architecture for multi-domain image-to-image translation that only requires one generator. Specifically, we provide detailed procedures for capturing the features of instances throughout the learning process, as well as learning the relationship between the style of the global image and that of a local instance in the image by enforcing the cross-granularity consistency. In order to capture local details within the content space, we employ a dual contrastive learning strategy that operates at both the instance and patch levels. Extensive studies on different multi-domain image-to-image translation datasets reveal that our proposed method outperforms state-of-the-art approaches.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?
This paper mainly explores the problem of Multi-Domain Image-to-Image Translation, which is a technique that learns the mapping from the source domain to the target domain among multiple image domains while preserving the content representation of the source domain. Existing methods often overlook the style differences between images and instances across different domains, or fail to accurately capture the instance details and boundaries, resulting in unsatisfactory translation effects for complex scenes. To address these issues, the paper proposes an effective architecture that uses only one generator. Specifically, they provide a detailed process to capture the features of instances throughout the learning process and learn the relationship between global image style and local instance style through enforced inter-scale consistency. In order to capture local details within the content space, they adopt a dual-contrastive learning strategy that operates at both the instance and patch levels. Experiments show that this approach outperforms existing state-of-the-art methods on various multi-domain image-to-image translation datasets. The contributions of the paper mainly include: 1. Introducing an inter-scale contrastive learning framework for high-quality multi-domain image-to-image translation. 2. Designing specific steps to incorporate instance features into the learning process and guide the learning relationship between instance style and image style through enforced inter-scale consistency. 3. Introducing multi-level instance-level and patch-level contrastive learning modules to preserve the local details of the original image or instances. 4. Validating the superiority of the proposed method through extensive qualitative and quantitative experiments, and demonstrating its performance on standard benchmarks. In addition, compared to other methods, their model only requires one generator for instance-aware mapping, simplifying the model structure and allowing certain shared features between instances and global images, making the generated instances easier to integrate into translated images.