A survey on social image semantic analysis
Zechao Li,Jinhui Tang
DOI: https://doi.org/10.1360/TB-2022-0938
2023-01-01
Abstract:With the development of social network and intelligence devices, increasing users update, share as well as tag their images and videos in the social multimedia websites, such as Flickr and Wechat. Social multimedia data enable users to connect and share their multimedia data including rich context information with anyone on Earth. The rich context information including user-provided tags, descriptions, comments and user information can be beneficial to the semantic understanding of social images. Up to now, many methods have been studied for image semantic analysis by exploring the rich social contextual information, which are defined as social image semantic analysis in this review. That is to say, social image semantic analysis is an important and fundamental task in multimedia, pattern analysis and computer vision, by uncovering the semantic information of images by exploring the social context including user-provided tags, user information, description and so on, which has been widely studied and achieved remarkable progress in recent years. To clearly present the development road map of social image semantic analysis methods, this paper presents a comprehensive survey on the social image analysis methods including shallow methods and deep methods. These methods are discussed according to the addressed tasks by consolidating two widely-studied research areas including image-image correlation learning and image-tag correlation learning. Specifically, image-image correlation learning is defined as social image retrieval in this review by estimating the visual similarity with the help of social contextual information. The corresponding methods include social image metric learning methods and social image hashing methods. Image-tag correlation learning is defined as social image tagging in this review by exploring the rich contextual information to estimate the correlations between images and tags. The corresponding methods include tag ranking, tag refinement, social image retagging, tag relevance learning, tag completion and social tag recommendation. All these methods are analyzed in details. Besides, two publicly available benchmark datasets for social image semantic analysis, i.e., MIRFlickr and NUS-WIDE, are discussed. For the NUS-WIDE datasets, two extended datasets including NUS-WIDE-128 and NUS-WIDE-USER are also presented. To evaluate the performance of social image semantic analysis methods, the widely-used evaluation metrics are presented, including mean average precision (MAP) for social image retrieval as well as F1 and the area under the receiver operating characteristic curve (AUC) for social image tagging. For each task, the performance of many relevant state-of-the-art methods are quantitatively compared in terms of different evaluation metrics and analyzed to show the impactful technical innovations for social image semantic analysis. Furthermore, several promising research topics and open problems that may attract much attention in future are highlighted, such as multi-modal pre-trained big models, the causal analysis problem, the interpretability of social image semantic analysis methods, the heavy tailed distribution problem, the out of distribution problem, the effective fine-tuning algorithm as well as the fine-grained semantic analysis models. And we present a plausible road map to deal with these problems. The final goal of this review is to present an overview of the existing social image semantic analysis methods and highlight the future research directions for the related research areas.