Stagnet: an Attentive Semantic RNN for Group Activity and Individual Action Recognition

Mengshi Qi,Yunhong Wang,Jie Qin,Annan Li,Jiebo Luo,Luc Van Gool
DOI: https://doi.org/10.1109/tcsvt.2019.2894161
IF: 5.859
2020-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the “factor sharing” and “message passing” mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.
What problem does this paper attempt to address?