Bottom-Up Temporal Action Localization with Mutual Regularization

Peisen Zhao,Lingxi Xie,Chen Ju,Ya Zhang,Yanfeng Wang,Qi Tian
DOI: https://doi.org/10.1007/978-3-030-58598-3_32
2020-01-01
Abstract:Recently, temporal action localization (TAL), i.e., finding specific actionsegments in untrimmed videos, has attracted increasing attentions of thecomputer vision community. State-of-the-art solutions for TAL involvesevaluating the frame-level probabilities of three action-indicating phases,i.e. starting, continuing, and ending; and then post-processing thesepredictions for the final localization. This paper delves deep into thismechanism, and argues that existing methods, by modeling these phases asindividual classification tasks, ignored the potential temporal constraintsbetween them. This can lead to incorrect and/or inconsistent predictions whensome frames of the video input lack sufficient discriminative information. Toalleviate this problem, we introduce two regularization terms to mutuallyregularize the learning procedure: the Intra-phase Consistency (IntraC)regularization is proposed to make the predictions verified inside each phase;and the Inter-phase Consistency (InterC) regularization is proposed to keepconsistency between these phases. Jointly optimizing these two terms, theentire framework is aware of these potential constraints during an end-to-endoptimization process. Experiments are performed on two popular TAL datasets,THUMOS14 and ActivityNet1.3. Our approach clearly outperforms the baseline bothquantitatively and qualitatively. The proposed regularization also generalizesto other TAL methods (e.g., TSA-Net and PGCN). code:https://github.com/PeisenZhao/Bottom-Up-TAL-with-MR
What problem does this paper attempt to address?