Refinement of Boundary Regression Using Uncertainty in Temporal Action Localization.

Yunze Chen,Mengjuan Chen,Rui Wu,Jiagang Zhu,Zheng Zhu,Qingyi Gu
2020-01-01
Abstract:Boundary localization is a key component of most temporal action localization frameworks for untrimmed video. Deep-learning methods have brought remarkable progress in this field due to large-scale annotated datasets (e.g., THUMOS14 and ActivityNet). However, natural ambiguity exists for labeling an accurate action boundaries with such datasets. In this paper, we propose a method to model this uncertainty. Specifically, we construct a Gaussian model for predicting the uncertainty variance of the boundary. The captured variance is further used to select more reliable proposals and to refine proposal boundaries by variance voting during post-processing. For most existing oneand two-stage frameworks, more accurate boundaries and reliable proposals can be obtained without additional computation. For the one-stage decoupled singleshot temporal action detection (Decouple-SSAD) [11] framework, we first apply the adaptive pyramid feature fusion method to fuse its features of different scales and optimize its structure. Then, we introduce the uncertainty based method and improve state-of-the-art mAP@0.5 value from 37.9% to 41.6% on THUMOS14. Moreover, for the two-stage proposalproposal interaction through a graph convolutional network (PGCN) [33], with such uncertainty method, we also gain significant improvements on both THUMOS14 and ActivityNet v1.3 datasets. Code and more details will be available at https://github.com/shadowclouds/Uty.
What problem does this paper attempt to address?