Learning from Easy to Hard Pairs: Multi-step Reasoning Network for Human-Object Interaction Detection

Yuchen Zhou,Guang Tan,Mengtang Li,Chao Gou
DOI: https://doi.org/10.1145/3581783.3612581
2023-01-01
Abstract:Human-object interaction (HOI) detection aims to interpret the interactions of human-object pairs. Existing methods adopt a one-step reasoning paradigm that simultaneously outputs multi-label results for all HOI pairs without distinguishing difficulties. However, there are significant variations among HOI pairs in the same image, making their performance degrade in challenging situations. In this paper, we argue that the model should prioritize hard samples after inferring easy ones, and hard samples can benefit from easy ones. To this end, we propose a novel Multi-step Reasoning Network that progressively learns from easy to hard samples. In particular, an Easy-to-Hard Learning Block is introduced to enhance the representation of hard HOI pairs by prior associations. Additionally, we propose a Multi-step Reasoning Probability Transfer mechanism to enhance multi-label interaction classifications, which leverages cognitive associations and semantic dependencies. Extensive experiments demonstrate that our method outperforms other state-of-the-art on two challenging benchmark datasets.
What problem does this paper attempt to address?