Multi-branch Graph Network for Learning Human-Object Interaction.

Tongtong Wu,Xu Zhang,Fuqing Duan,Liang Chang
DOI: https://doi.org/10.1007/978-3-030-88013-2_35
2021-01-01
Abstract:In this work, we study the task of detecting human-object interactions (HOI) from images, which is defined as detecting triplets of (human, predicate, object). A common practice in the literatures is firstly localizing human and object instances and then inferring the triplets or predicates only as a classification task from the detected human-object pairs. A data sparsity issue arises when inferring the triplets because of the serious data imbalance among HOI classes, while a data variance issue arises when inferring predicates only since a predicate can carry different semantic meanings when being applied to different objects. To resolve the problem, we propose to decompose HOI classes with a same predicate into several semantic groups based on the appearance, semantic information and function of the objects. By doing this, semantic-related HOI classes are grouped together to compensate the data sparsity issue, while visually and functionally less related HOI classes are separated to relieve the data variance issue. We reveal multiple levels of decomposition in different granularities can provide richer auxiliary information to boost the performance. We implement this idea with a multi-branch graph network, while the multiple branches make classifications based on different levels of decompositions. We evaluate our method on popular HICO-Det dataset. Experimental results show that our method achieves state-of-the art performance.
What problem does this paper attempt to address?