Cross-Level Guided Attention for Human-Object Interaction Detection

Zongxu Yue,Ge Li,Wei Gao
DOI: https://doi.org/10.1109/icmew59549.2023.00055
2023-01-01
Abstract:Recently, the transformer-based methods have achieved advanced performance result in human-object interaction (HOI) detection task. However, most of them directly utilize the semantically high-level feature from the deep layer's output in pre-trained backbone to get the final HOI detection results, which we consider may prevent the further performance improvement due to the semantic gap between the upstream pre-train task and HOI detection task. In this work, we design a Cross-Level Guided Attention Network (CLAN) for HOI detection. The proposed method utilizes the information from the pre-training task's semantically high-level feature to generate the attention score towards the low-level and primitive feature to get the key signal for HOI detection task. Experiments shows that CLAN can achieve competitive performance results on both V-COCO and HICO-DET benchmarks.
What problem does this paper attempt to address?