InterFormer: Human Interaction Understanding with Deformed Transformer.

Di He,Zexing Du,Xue Wang,Qing Wang
DOI: https://doi.org/10.1007/978-981-99-4761-4_17
2023-01-01
Abstract:Human interaction understanding (HIU) is a crucial and challenging problem, which consists of two subtasks: individual action recognition and pairwise interactive recognition. Previous methods do not fully capture the temporal and spatial correlations when understanding human interactions. To alleviate the problem, we decouple HIU into complementary parts for exploring comprehensive correlations among individuals. Especially, we design a multi-branch network, named InterFormer, to jointly model these interactive relations, which contains two parallel encoders to generate spatial and temporal features separately, and Spatial-Temporal Transformers (STTransformer) to exploit spatial and temporal contextual information in a cross-manner. Extensive experiments are conducted on two benchmarks, and the proposed InterFormer achieves state-of-the-art performance on these datasets.
What problem does this paper attempt to address?