Abstract:This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by ~11%. This is not via complicated design though, but by addressing two critical issues facing the community (i) the gold standard triplet loss does not enforce holistic latent space geometry, and (ii) there are never enough sketches to train a high accuracy model. For the former, we propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances. For the latter, we put forward a novel knowledge distillation module can leverage photo data for model training. Both modules are then plugged into a novel plug-n-playable training paradigm that allows for more stable training. More specifically, for (i) we employ an intra-modal triplet loss amongst sketches to bring sketches of the same instance closer from others, and one more amongst photos to push away different photo instances while bringing closer a structurally augmented version of the same photo (offering a gain of ~4-6%). To tackle (ii), we first pre-train a teacher on the large set of unlabelled photos over the aforementioned intra-modal photo triplet loss. Then we distill the contextual similarity present amongst the instances in the teacher's embedding space to that in the student's embedding space, by matching the distribution over inter-feature distances of respective samples in both embedding spaces (delivering a further gain of ~4-5%). Apart from outperforming prior arts significantly, our model also yields satisfactory results on generalising to new classes. Project page: <a class="link-external link-https" href="https://aneeshan95.github.io/Sketch_PVT/" rel="external noopener nofollow">this https URL</a>

TC-Net for Isbir

SceneSketcher-v2: Fine-Grained Scene-Level Sketch-Based Image Retrieval Using Adaptive GCNs

Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network

SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches

Transferable Coupled Network for Zero-Shot Sketch-Based Image Retrieval

Sketch-R2CNN: an Attentive Network for Vector Sketch Recognition

Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval

Contour detection network for zero-shot sketch-based image retrieval

Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval

Indicative Vision Transformer for end-to-end zero-shot sketch-based image retrieval

ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval

A Novel Visual-Region-Descriptor-based Approach to Sketch-based Image Retrieval

A hierarchical residual network with compact triplet-center loss for sketch recognition

Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network

Sketchpointnet: A Compact Network for Robust Sketch Recognition

Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples

AE-Net: Fine-grained Sketch-Based Image Retrieval Via Attention-Enhanced Network

Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR

3D-Isrnet