Socializing the Videos: A Multimodal Approach for Social Relation Recognition

Tong Xu,Peilun Zhou,Linkang Hu,Xiangnan He,Yao Hu,Enhong Chen
DOI: https://doi.org/10.1145/3416493
IF: 4.094
2021-01-01
ACM Transactions on Multimedia Computing Communications and Applications
Abstract:As a crucial task for video analysis, social relation recognition for characters not only provides semantically rich description of video content but also supports intelligent applications, e.g., video retrieval and visual question answering. Unfortunately, due to the semantic gap between visual and semantic features, traditional solutions may fail to reveal the accurate relations among characters. At the same time, the development of social media platforms has now promoted the emergence of crowdsourced comments, which may enhance the recognition task with semantic and descriptive cues. To that end, in this article, we propose a novel multimodal-based solution to deal with the character relation recognition task. Specifically, we capture the target character pairs via a search module and then design a multistream architecture for jointly embedding the visual and textual information, in which feature fusion and attention mechanism are adapted for better integrating the multimodal inputs. Finally, supervised learning is applied to classify character relations. Experiments on real-world data sets validate that our solution outperforms several competitive baselines.
What problem does this paper attempt to address?