Extracting Effective Image Attributes with Refined Universal Detection
Qiang Yu,Xinyu Xiao,Chunxia Zhang,Lifei Song,Chunhong Pan
DOI: https://doi.org/10.3390/s21010095
IF: 3.9
2020-12-25
Sensors
Abstract:Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation