Human-object Interaction Detection with Depth-Augmented Clues

Yamin Cheng,Hancong Duan,Chen Wang,Zhi Wang
DOI: https://doi.org/10.1016/j.neucom.2022.05.014
IF: 6
2022-01-01
Neurocomputing
Abstract:Human object interaction (HOI) detection aims to localize and classify triplets of human, object and relationship from a given image. Different from previous methods that only extract vision information in RGB images, we propose a Depth-augmented Relationship Reasoning (DRR) method that focuses on the RGB images and corresponding depth messages simultaneously. Rethinking principles of photography, we argue that RGB images discard spatial depth carrying third dimension relative distance information between instances. In light of this, we beforehand estimate the depth information for each image, yielding a corresponding depth map. Then we leverage multiple representations encoded by depth information and RGB images to enrich semantic interpretation. Subsequently, we explore a hierarchical attention strategy to fuse these semantic representations and further generate depth-augmented features, being used to reason about fine-grained human-object interactions. Extensive experiments on the benchmark datasets V-COCO, HICO-DET and HCVRD verify the effectiveness of our method and demonstrate the importance of spatial depth information for HOI.
What problem does this paper attempt to address?