Multi-scale Salient Region and Relevant Visual Keywords Based Model for Automatic Image Annotation

Xiao Ke,Wenzhong Guo
DOI: https://doi.org/10.1007/s11042-014-2318-2
IF: 2.577
2014-01-01
Multimedia Tools and Applications
Abstract:Automatic image annotation is a vital and challenging problem in pattern recognition and image understanding areas. The existing models directly extract visual features from segmented image regions. Since segmented image regions may still have multi-objects, the extractive visual features may not effectively describe corresponding regions. In addition, existing models did not consider the visual representations of corresponding keywords, which would lead to appearing plenty of irrelevant annotations in final annotation results, and these annotations did not relate to any part of images considering visual contents. In order to overcome the above problems, an image annotation model based on multi-scale salient region and relevant visual keywords is proposed. In this model, each image is segmented by using multi-scale grid segmentation method and the global contrast based method is used to extract the saliency maps from each image region. Visual features are extracted from each salient region. In addition, each keyword is divided into two categories: abstract words or non-abstract words. Visual seeds of each non-abstract word are established, and then a new method is proposed to extract visual keyword collections by using corresponding seeds. According to the traits of abstract words, an algorithm based on subtraction regions is proposed to extract visual seeds and corresponding visual keyword collections of each abstract word. Adaptive parameter method and a fast solution algorithm are proposed to determine the similarity thresholds of each keyword. Finally, multi-scale visual features and the combinations of the above methods are used to improve the annotation performance. Our model can improve the object descriptions of images and image regions. Experimental results verify the effectiveness of the proposed model.
What problem does this paper attempt to address?