CTAP: Complementary Temporal Action Proposal Generation

Jiyang Gao,Kan Chen,Ram Nevatia

DOI: https://doi.org/10.48550/arXiv.1807.04821

2018-07-19

Abstract:Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture "clips" or temporal intervals in videos that are likely to contain an action. Previous methods can be divided to two groups: sliding window ranking and actionness score grouping. Sliding windows uniformly cover all segments in videos, but the temporal boundaries are imprecise; grouping based method may have more precise boundaries but it may omit some proposals when the quality of actionness score is low. Based on the complementary characteristics of these two methods, we propose a novel Complementary Temporal Action Proposal (CTAP) generator. Specifically, we apply a Proposal-level Actionness Trustworthiness Estimator (PATE) on the sliding windows proposals to generate the probabilities indicating whether the actions can be correctly detected by actionness scores, the windows with high scores are collected. The collected sliding windows and actionness proposals are then processed by a temporal convolutional neural network for proposal ranking and boundary adjustment. CTAP outperforms state-of-the-art methods on average recall (AR) by a large margin on THUMOS-14 and ActivityNet 1.3 datasets. We further apply CTAP as a proposal generation method in an existing action detector, and show consistent significant improvements.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate high - quality Temporal Action Proposals in videos. Specifically, the authors focus on improving the accuracy of temporal action proposals, which aim to capture "segments" or time intervals in videos that are likely to contain an action. Existing methods are mainly divided into two categories: the sliding - window ranking method and the action - score grouping method. Although the sliding - window method can evenly cover all parts of the video, its time boundaries are not precise enough; while the action - score - based method may have more precise boundaries, but it may miss some proposals when the action - score quality is low. Therefore, the paper proposes a new Complementary Temporal Action Proposal (CTAP) generator, aiming to combine the advantages of these two methods to generate higher - quality action proposals. CTAP achieves this goal through the following three modules: 1. **Initial Proposal Generation**: In this stage, initial proposals are generated from two sources, one is based on action scores and Temporal Action Grouping (TAG), and the other is the sliding window. 2. **Proposal Complementary Filtering**: Since TAG will miss correct proposals when the action - score quality is low, and the sliding window can evenly cover all parts of the video, a complementary filter is designed to collect high - quality complementary proposals from the sliding window to fill in the proposals missed by TAG. 3. **Proposal Ranking and Boundary Adjustment**: In this stage, a temporal convolutional neural network is used to rank the proposals and adjust the time boundaries, thereby retaining the order information of the proposal boundaries. The paper conducted experiments on the THUMOS - 14 and ActivityNet v1.3 datasets. The results show that CTAP significantly outperforms existing methods in terms of Average Recall (AR), and also shows consistent performance improvement in the action detection task.

CTAP: Complementary Temporal Action Proposal Generation

Exploiting Semantic-Level Affinities with a Mask-Guided Network for Temporal Action Proposal in Videos.

Play and rewind: Context-aware video temporal action proposals

Content Temporal Relation Network for temporal action proposal generation

Cascaded Boundary Network for High-Quality Temporal Action Proposal Generation

Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation

RecapNet: Action Proposal Generation Mimicking Human Cognitive Process

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

TAN: a temporal-aware attention network with context-rich representation for boosting proposal generation

Relaxed Transformer Decoders for Direct Action Proposal Generation

Superframe-Based Temporal Proposals for Weakly Supervised Temporal Action Detection

ProposalVLAD with Proposal-Intra Exploring for Temporal Action Proposal Generation

Estimation of Reliable Proposal Quality for Temporal Action Detection

Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

TSI: Temporal Scale Invariant Network for Action Proposal Generation

Temporal Attention Network for Action Proposal

Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network

Truncated Attention-Aware Proposal Networks with Multi-Scale Dilation for Temporal Action Detection

BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

Multi-Level Content-Aware Boundary Detection for Temporal Action Proposal Generation

SAP: Self-Adaptive Proposal Model for Temporal Action Detection Based on Reinforcement Learning