LSTM-based Multi-Label Video Event Detection

An-An Liu,Zhuang Shao,Yongkang Wong,Junnan Li,Yu-Ting Su,Mohan Kankanhalli
DOI: https://doi.org/10.1007/s11042-017-5532-x
IF: 2.577
2017-01-01
Multimedia Tools and Applications
Abstract:Since large-scale surveillance videos always contain complex visual events, how to generate video descriptions effectively and efficiently without human supervision has become mandatory. To address this problem, we propose a novel architecture for jointly recognizing multiple events in a given surveillance video, motivated by the sequence to sequence network. The proposed architecture can predict what happens in a video directly without the preprocessing of object detection and tracking. We evaluate several variants of the proposed architecture with different visual features on a novel dataset perpared by our group. Moreover, we compute a wide range of quantitative metrics to evaluate this architecture. We further compare it to the popular Support Vector Machine-based visual event detection method. The comparison results suggest that the proposal method can outperform the traditional computer vision pipelines for visual event detection.
What problem does this paper attempt to address?