Bi-direction Feature Pyramid Temporal Action Detection Network

Jiang He,Yan Song,Haiyu Jiang
DOI: https://doi.org/10.1007/978-3-030-41404-7_63
2020-01-01
Abstract:Temporal action detection in long-untrimmed videos is still a challenging task in video content analysis. Many existing approaches contain two stages, which firstly generate action proposals and then classify them. The main drawback of these approaches is that there are repeated operations in the proposal extraction and the classification stages. In this paper, we propose a novel Bi-direction Feature Pyramid Temporal Action Detection (BFPTAD) Network based on 1D temporal convolutional and deconvolutional layers to detect action instances directly in long-untrimmed videos. We use the top-down pathway to add semantic information to the shallow feature maps, and then use the bottom-up pathway to add location information to the deep feature maps. We evaluate our network on THUMOS14 and ActivityNet benchmarks. Our approach significantly outperforms other state-of-the-art methods by increasing mAP@IoU = 0.5 from 44.2% to 52.2% on THUMOS14.
What problem does this paper attempt to address?