Abstract:Zero-shot sketch-based image retrieval (ZS-SBIR) is an extremely challenging cross-modal retrieval task. In ZS-SBIR, hand-drawn sketches are used as queries to retrieve corresponding natural images in zero-shot scenarios. Existing methods utilize diverse loss functions to guide deep neural networks (DNNs) to align feature representations of both sketches and images. In general, these methods supervise only the last layer of DNNs and then update each layer of DNNs using back-propagate technology. However, this strategy cannot effectively optimize the intermediate layers of DNNs, potentially hindering retrieval performance. To address this issue, we propose a deep supervision network with contrastive learning (DSNCL) approach for ZS-SBIR. Specifically, we employ a novel deep supervision network training method that attaches multiple projection heads to the intermediate layers of DNNs. These projection heads map multi-level features to a normalized embedding space and are trained by contrastive learning. The proposed method instructs the intermediate layers of DNNs to learn the invariance of various data augmentation, thereby aligning the feature representations of both sketches and images. This significantly narrows its domain gap and semantic gap. Besides, we use contrastive learning to directly optimize the intermediate layers of DNNs, which effectively reduces the optimization difficulty of their intermediate layers. Furthermore, we investigate the cross-batch metric (CBM) learning mechanism, which stores samples of different batches for metric learning by constructing a semantic queue, to further improve the performance in ZS-SBIR applications. Comprehensive experimental results on the Sketchy and TU-Berlin datasets validate the superiority of our DSNCL method over existing state-of-the-art methods.

Deep Supervision Network with Contrastive Learning for Zero-Shot Sketch-Based Image Retrieval

Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval

SceneSketcher-v2: Fine-Grained Scene-Level Sketch-Based Image Retrieval Using Adaptive GCNs

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval

Contour detection network for zero-shot sketch-based image retrieval

Transferable Coupled Network for Zero-Shot Sketch-Based Image Retrieval

Stacked Semantic-Guided Network for Zero-Shot Sketch-Based Image Retrieval.

Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network

Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval

Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval

Zero-shot sketch-based image retrieval via adaptive relation-aware metric learning

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR

Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval

Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval

ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval

Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

Indicative Vision Transformer for end-to-end zero-shot sketch-based image retrieval

Distribution Aligned Feature Clustering for Zero-Shot Sketch-Based Image Retrieval

Deep Reinforced Attention Regression for Partial Sketch Based Image Retrieval