Shifted GCN-GAT and Cumulative-Transformer Based Social Relation Recognition for Long Videos.

Haorui Wang,Yibo Hu,Yangfu Zhu,Jinsheng Qi,Bin Wu
DOI: https://doi.org/10.1145/3581783.3612175
2023-01-01
Abstract:Social Relation Recognition is an important part of Video Understanding, providing insights into the information that videos convey. Most previous works mainly focused on graph generation for characters, instead of edges which are more suitable for relation modelling. Furthermore, previous methods tend to recognize social relations for single frames or short video clips within their receptive fields, neglecting the importance of continuous reasoning throughout the entire video. To tackle these challenges, we propose a novel Shifted GCN-GAT and Cumulative-Transformer framework, named SGCAT-CT. The overall architecture consists of an SGCAT module for shifted graph operations on novel relation graphs and a CT module for temporal processing with memory. SGCAT-CT conducts continuous recognition of social relations and memorizes information from as early as the beginning of a long video. Experiments conducted on several video datasets demonstrate encouraging performance on long videos. Our code will be released at https://github.com/HarryWgCN/SGCAT-CT.
What problem does this paper attempt to address?