Temporal Action Detection with Structured Segment Networks

Yue Zhao,Yuanjun Xiong,Limin Wang,Zhirong Wu,Xiaoou Tang,Dahua Lin
DOI: https://doi.org/10.1007/s11263-019-01211-2
IF: 13.369
2019-08-28
International Journal of Computer Vision
Abstract:This paper addresses an important and challenging task, namely detecting the temporal intervals of actions in untrimmed videos. Specifically, we present a framework called structured segment network (SSN). It is built on temporal proposals of actions. SSN models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and precise localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end manner. Additionally, a simple yet effective temporal action proposal scheme, dubbed temporal actionness grouping is devised to generate high quality action proposals. We further study the importance of the decomposed discriminative model and discover a way to achieve similar accuracy using a single classifier, which is also complementary with the original SSN design. On two challenging benchmarks, THUMOS'14 and ActivityNet, our method remarkably outperforms previous state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling actions with various temporal structures.
computer science, artificial intelligence
What problem does this paper attempt to address?