Fundamental Visual Concept Learning from Correlated Images and Text.
Youtian Du,Hang Wang,Yunbo Cui,Xin Huang
DOI: https://doi.org/10.1109/tip.2019.2899944
IF: 10.6
2019-01-01
IEEE Transactions on Image Processing
Abstract:Heterogeneous web media consists of many visual concepts, such as objects, scenes, and activities, which cannot be semantically decomposed. The task of learning fundamental visual concepts (FVCs) plays an important role in automatically understanding the elements that compose all visual media, as well as in applications of retrieval, annotation, and so on. In this paper, we formulate the problem of FVC learning and propose an approach to this problem called neighboring concept distributing (NCD). Our approach models all data using a concept graph, which considers the visual patches in images as nodes and generates the inter-image edges between visual patches in different images and the intra-image edges between visual patches in the same image. The NCD approach distributes semantic information from images to visual patches based on measurements over the concept graph, including fitness, distinctiveness, smoothness, and sparseness, without any pre-trained concept detectors or classifiers. We analyze the learnability of the proposed approach and find that, under some conditions, all concepts can be correctly learned with an arbitrarily high probability as the size of the data increases. We demonstrate the performance of the NCD approach using three public datasets. The experimental results show that our approach outperforms state-of-the-art approaches when learning visual concepts from correlated media.