Multi-Stream Interaction Networks for Human Action Recognition
Haoran Wang,Baosheng Yu,Jiaqi Li,Linlin Zhang,Dongyue Chen
DOI: https://doi.org/10.1109/tcsvt.2021.3098839
IF: 5.859
2021-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Skeleton-based human action recognition has received extensive attention due to its efficiency and robustness to complex backgrounds. Though the human skeleton can accurately capture the dynamics of human poses, it fails to recognize human actions induced by the interaction between human and objects, making it is of great importance to further explore the interaction between the human and objects for human action recognition. In this paper, we devise the multi-stream interaction networks (MSIN), to simultaneously explore the dynamics of human skeleton, objects, and the interaction between human and objects. Specifically, apart from the traditional human skeleton stream, 1) the second stream explores the dynamics of object appearance from the objects surrounding the human body joints; and 2) the third stream captures the dynamics of object position in regard to the distance between the object and different human body joints. Experimental results on three popular skeleton-based human action recognition datasets, NTU RGB + D, NTU RGB + D 120, and SYSU, demonstrate the effectiveness of the proposed method, especially for recognizing the human actions with human-object interactions.
engineering, electrical & electronic