Abstract:Can we have a universal detector that could visually recognize unseen objects with no training exemplars available? Such a detector is so desirable, as there are hundreds of thousands of object concepts in human vocabulary but few labeled image examples available. In this study, we attempt to build such a universal detector to predict concepts in the absence of training data. First, by considering both semantic relatedness and visual variance, we mine a set of realistic small-semantic-gap (SSG) concepts from a large-scale image corpus, i.e., ImageNet, which comprises 4961 concepts and nearly 4 million images. The discovered SSG concepts can be depicted well by visual models and their detectors can deliver reasonably satisfactory recognition accuracies. From these distinctive visual models, we then leverage the semantic ontology knowledge and co-occurrence statistics of concepts to extend visual recognition to unseen concepts. The rational is that object concepts generally co-occur in a real-life image. Their visual co-occurrence and semantic ontology provide the possibility for concept recognition to transcend the visual learning of image exemplars, and therefore, enable the detector to predict unseen realistic concepts without training samples. To the best of our knowledge, this work presents the first research attempting to substantiate the semantic gap measuring of a large amount of concepts and leverage visually learnable concepts to predicate those with no training images available. Testings on NUS-WIDE dataset demonstrate that the selected concepts with small semantic gaps can be well modeled and the prediction of unseen concepts delivers promising results with comparable accuracy to preliminary training-based methods.

ConceptLearner: Discovering Visual Concepts from Weakly Labeled Image Collections

Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

Explore Visual Concept Formation for Image Classification

FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

Towards a Universal Detector by Mining Concepts with Small Semantic Gaps

Visual Concept-Metaconcept Learning

An efficient concept detection system via sparse ensemble learning.

Regularized Semi-Supervised Latent Dirichlet Allocation for Visual Concept Learning

Fundamental Visual Concept Learning from Correlated Images and Text.

On the Sampling of Web Images for Learning Visual Concept Classifiers

Shapelearner: Towards Shape-Based Visual Knowledge Harvesting

Sampling and Ontologically Pooling Web Images for Visual Concept Learning

Webly-supervised Visual Concept Learning with Cardinality Guided Instance Mining and Clustered Multitask Refinement.

Computational Baby Learning

Semantic context learning with large-scale weakly-labeled image set.

SegDiscover: Visual Concept Discovery via Unsupervised Semantic Segmentation

Rectifying Self Organizing Maps for Automatic Concept Learning from Web Images

Discriminative Structure Learning for Semantic Concept Detection with Graph Embedding

Robust Semantic Concept Detection in Large Video Collections