A Two-Stage Temporal Proposal Network for Precise Action Localization in Untrimmed Video
Wang Fei,Wang Guorui,Du Yuxuan,He Zhenquan,Jiang Yong
DOI: https://doi.org/10.1007/s13042-021-01301-z
2021-01-01
International Journal of Machine Learning and Cybernetics
Abstract:In this paper, we propose a two-stage temporal proposal algorithm for the action detection task of long untrimmed videos. In the first stage, we propose a novel prior-minor watershed algorithm for action proposals with precise prior watershed proposal algorithm and minor supplementary sliding window algorithm. Here, we propose the correctness discriminator to fill the proposals that watershed proposal algorithm may omit with the sliding window proposals. In the second stage, an extended context pooling (ECP) is firstly proposed with two modules (internal and context). The context information module of ECP can structure the proposals and enhance the extended features of action proposals. Different level of ECP is introduced to model the action proposal region and make its extended context region more targeted and precise. Then, we propose a temporal context regression network, which adopts a multi-task loss to realize the training of the temporal coordinate regression and the action/background classification simultaneously, and outputs the precise temporal boundaries of the proposals. Here, we also propose prior-minor ranking to balance the effect of the prior watershed proposals and the minor supplementary proposals. On three large scale benchmarks THUMOS14, ActivityNet (v1.2 and v1.3), and Charades, our approach achieves superior performances compared with other state-of-the-art methods and runs over 1020 frames per second (fps) on a single NVIDIA Titan-X Pascal GPU, indicating that our method can efficiently improve the precision of action localization task.