Abstract:Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic. Moreover, human sketch understanding goes beyond categorization -- critically, it also entails understanding how individual elements within a sketch correspond to parts of the physical world it represents. What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, $\textit{PSC6k}$, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. Our model uses a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo extracted by a contrastive learning-based ConvNet backbone. We found that this approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, our benchmark also revealed systematic differences between predictions of the suite of models we tested and those of humans. Taken together, our work suggests a promising path towards developing artificial systems that achieve more human-like understanding of visual images at different levels of abstraction. Project page: <a class="link-external link-https" href="https://photo-sketch-correspondence.github.io" rel="external noopener nofollow">this https URL</a>

Unsupervised Feature Learning for Dense Correspondences Across Scenes

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity.

Learning Fine-Grained Features for Pixel-wise Video Correspondences

Multi-scale Matching Networks for Semantic Correspondence

Pixel-level Semantic Correspondence Through Layout-aware Representation Learning and Multi-scale Matching Integration

Ensemble Learning with Advanced Fast Image Filtering Features for Semi-Global Matching

Correspondence Transformers with Asymmetric Feature Learning and Matching Flow Super-Resolution

Learning Camera Localization via Dense Scene Matching

Efficient Dynamic Correspondence Network

SCENES: Subpixel Correspondence Estimation With Epipolar Supervision

Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo

Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images

iMatching: Imperative Correspondence Learning

Learning Dense Correspondences between Photos and Sketches

Universal Correspondence Network

Optical and SAR Image Matching Using Pixelwise Deep Dense Features

UNSUPERVISED MULTI-CONSTRAINT DEEP NEURAL NETWORK FOR DENSE IMAGE MATCHING

Learning Deep Correspondence Through Prior and Posterior Feature Constancy

Unsupervised Non-Rigid Point Cloud Matching through Large Vision Models

Deep Graph Matching Based Dense Correspondence Learning Between Non-Rigid Point Clouds.

A Dense Optical Flow-Based Feature Matching Approach in Visual Odometry