Action Keyframe Connection Network for Temporal Action Proposal Generation

Shengbo Wang,Zhenjiang Miao,Tianyu Zhou,Miaomiao Li,Ruyi Zhang
DOI: https://doi.org/10.1088/1742-6596/1229/1/012035
2019-01-01
Abstract:Temporal action detection is an important research topic in computer vision, of which Temporal Action Proposal (TAP) generation is a key step for finding candidate action segments. Our paper provides an action proposal generation network for temporally untrimmed videos in which a new effective and efficient deep architecture named action keyframe connection network for temporal action proposal Generation. Firstly, a two-stream network is adopted to extract frame-level features which inclued appearance feature and optical flow feature. The temporal information helps the subsequent network to determine whether a frame is the beginning or the ending of the action. Secondly, a position discrimination network is designed to infer the probability of each frame being starting frame or ending frame. The network outputs a starting probability sequence and an ending probability sequence which indicates the start of the action and the end of the action respectively. Finally, our network generates a proposal by a specific threshold rule combining the points in the starting probability sequence and the ending probability sequence. We carry out experiments on ActivityNet dataset to compare our proposed method with the state-of-the-art methods. Experiment results show that our method achieves superior performance over other methods.
What problem does this paper attempt to address?