A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction

Yi Xing,Yaping Dai,Kaoru Hirota,Zhiyang Jia
DOI: https://doi.org/10.1109/cac53003.2021.9727900
2021-01-01
Abstract:In this paper, a transformer-based dual position attention network (TDPAN) is proposed for recognizing human-object interaction. The dual attention module embedding position information is designed to scan the entire area of an image space and adaptively aggregate crucial features. Moreover, the transformer architecture is adopted for effectively extract essential region features in a binary pairwise manner from the sequence image data. Compared to the CNN-based method, TDPAN feature aggregation does not require prior modification of the region of interest, but also focus more on the important context information in images. The experiments demonstrate that the TDPAN outperforms previous methods on two datasets (the HICO-DET dataset and the V-COCO dataset). Specifically, the recognition accuracy is increased by 4.3% compared with the prior convolutional neural network methods in V-COCO dataset.
What problem does this paper attempt to address?