Spatial-temporal dual-actor CNN for human interaction prediction in video

Mahlagha Afrasiabi,Hassan Khotanlou,Theo Gevers
DOI: https://doi.org/10.1007/s11042-020-08845-2
IF: 2.577
2020-04-08
Multimedia Tools and Applications
Abstract:Predicting the interaction between two humans, when viewed as a part of video is one of the most challenging issues in the field of computer vision, due to its various applications. This paper presents a new interaction prediction method that has a high accuracy in detecting the interactions when a small percentage of the video is viewed. At first, the interacting people are detected and then a dual-actor CNN model is utilized to recognize the type of interaction between the detected people. This model consists of two CNN networks while the parameters of which are shared. Each branch of this model extracts deep temporal or spatial features. The spatial and the temporal models are learned with Long Short Term Memory (LSTM) networks to model time information. Finally, the spatial and temporal models are combined to predict the interaction. The results show that the proposed model gives improvements on standard interaction recognition datasets including the TV Human Interaction, BIT interaction and UT Interaction.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?