Exploring the Consistency of Segment-level and Video-level Predictions for Improved Temporal Concept Localization in Videos

Zejia Weng,Rui Wang,Yu-Gang Jiang
2019-01-01
Abstract:Compared with the previous video-level classification, the YouTube-8M video understanding challenge of 2019 mainly focuses on temporally localizing the entities from videos. Specifically, human-verified segment-level annotations are provided for learning temporal localization models. This paper mainly introduces our system designed for the challenge. Specifically, we consider utilizing the consistency between segment&video-level predictions and ensembling different feature aggregation methods, such as variants of NetVLAD, Soft-Bag-of-Feature, Gated-Bag-ofFeature, Fisher Vector and Average Pooling. Experimental results demonstrate the effectiveness of our system on this task. Equipped with the proposed system, we achieve 0.82620 in terms of MAP@100,000, ranking 2-nd among all submissions in the challenge.
What problem does this paper attempt to address?