Graph-based social relation inference with multi-level conditional attention
Xiaotian Yu,Hanling Yi,Qie Tang,Kun Huang,Wenze Hu,Shiliang Zhang,Xiaoyu Wang
DOI: https://doi.org/10.1016/j.neunet.2024.106216
IF: 7.8
2024-02-01
Neural Networks
Abstract:Social relation inference intrinsically requires high-level semantic understanding. In order to accurately infer relations of persons in images, one needs not only to understand scenes and objects in images, but also to adaptively attend to important clues. Unlike prior works of classifying social relations using attention on detected objects, we propose a MUlti-level Conditional Attention (MUCA) mechanism for social relation inference, which attends to scenes, objects and human interactions based on each person pair. Then, we develop a transformer-style network to achieve the MUCA mechanism. The novel network named as Graph-based Relation Inference Transformer (i.e., GRIT) consists of two modules, i.e., a Conditional Query Module (CQM) and a Relation Attention Module (RAM). Specifically, we design a graph-based CQM to generate informative relation queries for all person pairs, which fuses local features and global context for each person pair. Moreover, we fully take advantage of transformer-style networks in RAM for multi-level attentions in classifying social relations. To our best knowledge, GRIT is the first for inferring social relations with multi-level conditional attention. GRIT is end-to-end trainable and significantly outperforms existing methods on two benchmark datasets, e.g., with performance improvement of 7.8% on PIPA and 9.6% on PISC.
computer science, artificial intelligence,neurosciences