Abstract:Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic. Moreover, human sketch understanding goes beyond categorization -- critically, it also entails understanding how individual elements within a sketch correspond to parts of the physical world it represents. What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, $\textit{PSC6k}$, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. Our model uses a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo extracted by a contrastive learning-based ConvNet backbone. We found that this approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, our benchmark also revealed systematic differences between predictions of the suite of models we tested and those of humans. Taken together, our work suggests a promising path towards developing artificial systems that achieve more human-like understanding of visual images at different levels of abstraction. Project page: <a class="link-external link-https" href="https://photo-sketch-correspondence.github.io" rel="external noopener nofollow">this https URL</a>

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

Electors Voting for Fast Automatic Shape Correspondence

Direct Alignment with Generalized Correspondences: A Unified Framework for Structure-Based Visual Pose Estimation.

SSC: Semantic Scan Context for Large-Scale Place Recognition

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence

Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps

Geometry-aware Feature Matching for Large-Scale Structure from Motion

Semantic-Aware Fine-Grained Correspondence

Object-Aware Dense Semantic Correspondence

Fine-grained Object Semantic Understanding from Correspondences

Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints From Limited Training Data

Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence

Fine-grained Object Semantic Understanding from Correspondences.

Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation

Learning to Identify Correct 2D-2D Line Correspondences on Sphere

Multi-Stage Network With Geometric Semantic Attention for Two-View Correspondence Learning

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild

Learning Dense Correspondences between Photos and Sketches

Human Correspondence Consensus for 3D Object Semantic Understanding

Pixel-level Semantic Correspondence Through Layout-aware Representation Learning and Multi-scale Matching Integration

Delving into Shape-aware Zero-shot Semantic Segmentation