Abstract:Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from social media websites. With the general belief that the probability of correct tagging is higher than that of incorrect tagging, such a solution often sounds feasible, though is not without challenges. First, user-tags can be subjective and, to certain extent, are ambiguous. For instance, an image tagged with "whales" may be simply a picture about ocean museum. Learning concept "whales" with such training samples will not be effective. Second, user-tags can be overly abbreviated. For instance, an image about concept "wedding" may be tagged with "love" or simply the couple's names. As a result, crawling sufficient positive training examples is difficult. This paper empirically studies the impact of exploiting the tagged images towards concept learning, investigating the issue of how the quality of pseudo training images affects concept detection performance. In addition, we propose a simple approach, named semantic field, for predicting the relevance between a target concept and the tag list associated with the images. Specifically, the relevance is determined through concept-tag co-occurrence by exploring external sources such as WordNet and Wikipedia. The proposed approach is shown to be effective in selecting pseudo training examples, exhibiting better performance in concept learning than other approaches such as those based on keyword sampling and tag voting.

Leveraging Multi-modal Prior Knowledge for Large-scale Concept Learning in Noisy Web Data.

Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning

Learning to Detect Concepts from Webly-Labeled Video Data

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images.

CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

Webly-supervised Visual Concept Learning with Cardinality Guided Instance Mining and Clustered Multitask Refinement.

Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video

Webly-Supervised Fine-Grained Visual Categorization Via Deep Domain Adaptation.

Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation

Learning Heterogeneous Data for Hierarchical Web Video Classification

Webly-Supervised Video Recognition By Mutually Voting For Relevant Web Images And Web Video Frames

The Web Can Be Your Oyster for Improving Large Language Models

Semantic Concept Learning Through Massive Internet Video Mining

Noise-Aware Fully Webly Supervised Object Detection.

Sampling and Ontologically Pooling Web Images for Visual Concept Learning

Webly-supervised semantic segmentation via curriculum learning

Web Video Categorization Using Category-Predictive Classifiers and Category-Specific Concept Classifiers

Webly Supervised Learning with Category-level Semantic Information

Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning

On the Sampling of Web Images for Learning Visual Concept Classifiers

WebVision Database: Visual Learning and Understanding from Web Data