Abstract:Recent research in cross-domain image retrieval has focused on addressing two challenging issues: handling domain variations in the data and dealing with the lack of sufficient training labels. However, these problems have often been studied separately, limiting the practicality and significance of the research outcomes. The existing cross-domain setting is also restricted to cases where domain labels are known during training, and all samples have semantic category information or instance correspondences. In this paper, we propose a novel approach to address a more general and practical problem: fully unsupervised domain-agnostic image retrieval under the domain-unknown setting, where no annotations are provided. Our approach tackles both the domain variation and missing labels challenges simultaneously. We introduce a new fully unsupervised One-Shot Synthesis-based Contrastive learning method (termed OSSCo) to project images from different data distributions into a shared feature space for similarity measurement. To handle the domain-unknown setting, we propose One-Shot unpaired image-to-image Translation (OST) between a randomly selected one-shot image and the rest of the training images. By minimizing the global distance between the original images and the generated images from OST, the model learns domain-agnostic representations. To address the label-unknown setting, we employ contrastive learning with a synthesis-based transform module from the OST training. This allows for effective representation learning without any annotations or external constraints. We evaluate our proposed method on diverse datasets, and the results demonstrate its effectiveness. Notably, our approach achieves comparable performance to current state-of-the-art supervised methods.

Generalized Image Embedding for Multi-Domain Image Retrieval.

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Semantic Image Retrieval Based on Multiple-Instance Learning

Domain-Specific Modeling and Semantic Alignment for Image-Based 3d Model Retrieval

Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders

Universal Model for Multi-Domain Medical Image Retrieval

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

Coupled Binary Embedding for Large-Scale Image Retrieval

Learning Transferable and Discriminative Representations for 2D Image-Based 3D Model Retrieval

Generalized Multi-view Embedding for Visual Recognition and Cross-modal Retrieval

Seeing the Big Picture: Deep Embedding with Contextual Evidences

Depthwise Convolution is All You Need for Learning Multiple Visual Domains

Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation

Multi-mapping Image-to-Image Translation Via Learning Disentanglement.

Fully Unsupervised Domain-Agnostic Image Retrieval

Collaborative Index Embedding for Image Retrieval

Random Projection Tree And Multiview Embedding For Large-Scale Image Retrieval

Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval

Tensor Index for Large Scale Image Retrieval

Unsupervised Multi-Domain Multimodal Image-to-image Translation with Explicit Domain-Constrained Disentanglement.

Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning