Abstract:Human-Object Interaction (HOI) detection aims to infer different interactions, which occur between humans and related objects of images. HOI is usually represented by a triplet human,action,object and can be modeled as a graph. Thus, with global structural information of images, graph-based methods can detect interactions. However, in existing graph networks, although different fully-connected graphs are built, all detected bounding boxes are regarded as graph nodes equally or different types of nodes according to the category, thereby the dominant role of humans in HOI is ignored. In addition, object node representations mainly focus on appearance features, contributing little to HOI inference. To address these issues, a novel graph-based HOI detection model, named interaction-centric graph parsing network (iCGPN), models one human node as a central node, and other nodes as semantic nodes. Firstly, for each detected human instance, a human-centric fully-connected graph is constructed to learn related HOIs. Secondly, in order to reflect the difference between central nodes and semantic nodes, we design different feature representations and model different edge relationships. Through introducing the attention mechanism, global information related to human-object interaction is explored to enrich the semantic node representation, in which spatial layout, relative locations and object categories information are also combined. Finally, a multi-relation graph convolutional network is applied to update the node feature and infer the HOI. Furthermore, a multi-IOU random shift scheme is proposed to augment the data of the training set to fit the object detection deviation and enhance the generalization ability of our network. Extensive experimental results show that iCGPN achieves very competitive results in comparison with state-of-the-arraph-based methods on the V-COCO and HICO-DET datasets, which demonstrate the effectiveness of the proposed method.

Learning Human-Object Interactions by Graph Parsing Neural Networks

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics.

Cascaded Parsing of Human-Object Interaction Recognition

iCGPN: Interaction-centric graph parsing network for human-object interaction detection

Relation Parsing Neural Network for Human-Object Interaction Detection

Interactivity Recognition Graph Neural Network (IR-GNN) Model for Improving Human–Object Interaction Detection

Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

IPGN: Interactiveness Proposal Graph Network for Human-Object Interaction Detection

A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection

Learning Human-Object Interaction Detection Using Interaction Points

Learning Human-Object Interaction via Interactive Semantic Reasoning

Hierarchical Reasoning Network for Human-Object Interaction Detection

Learning to Detect Human-Object Interactions

Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

GID-Net: Detecting human-object interaction with global and instance dependency

GTNet:Guided Transformer Network for Detecting Human-Object Interactions

Exploiting Scene Graphs for Human-Object Interaction Detection

Detecting and Recognizing Human-Object Interactions

Versatile Graph Neural Networks Toward Intuitive Human Activity Understanding

TMHOI: Translational Model for Human-Object Interaction Detection