Toward jointly understanding social relationships and characters from videos

Yiyang Teng,Chenguang Song,Bin Wu
DOI: https://doi.org/10.1007/s10489-021-02738-z
IF: 5.3
2021-08-18
Applied Intelligence
Abstract:Automatically recognizing social relationships from videos provides intelligent systems with great potential to better understand the behaviors or emotions of human beings. Most existing methods mainly focus on inferring social characters by detecting their interactions or independently predicting each social relationship. However, they cannot directly learn all social relationships and characters. In this paper, we propose a character and relationship joint learning (CRJL) framework to simultaneously infer all social relationships and character pairs involved in videos. First, the video context and the logical associations among relationships provide important cues for social scene understanding. To incorporate these cues into social relationships and character reasoning, we design a novel character and relationship reasoning graph (CRRG). Specifically, we model the relationship passing process on the graph to learn the logical constraints among relationships. We also introduce a graph attention mechanism to capture discriminative video semantic information. Second, localizing a social character pair via supervised learning is time-consuming, as it requires the annotation of video tracks. Instead, we propose a weak label-based training strategy using clip-level relationships. Experimental results on a public benchmark demonstrate the superiority of our method.
computer science, artificial intelligence
What problem does this paper attempt to address?