Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection

Xue Lin,Qi Zou,Xixia Xu
DOI: https://doi.org/10.24963/ijcai.2020/154
2020-01-01
Abstract:Human-object interaction (HOI) detection is important to understand human-centric scenes and is challenging due to subtle difference between fine-grained actions, and multiple co-occurring interactions. Most approaches tackle the problems by considering the multi-stream information and even introducing extra knowledge, which suffer from a huge combination space and the non-interactive pair domination problem. In this paper, we propose an Action-Guided attention mining and Relation Reasoning (AGRR) network to solve the problems. Relation reasoning on human-object pairs is performed by exploiting contextual compatibility consistency among pairs to filter out the non-interactive combinations. To better discriminate the subtle difference between fine-grained actions, an action-aware attention based on class activation map is proposed to mine the most relevant features for recognizing HOIs. Extensive experiments on V-COCO and HICO-DET datasets demonstrate the effectiveness of the proposed model compared with the state-of-the-art approaches.
What problem does this paper attempt to address?