Abstract:Multimedia event detection aims to precisely retrieve videos that contain complex semantic events from a large pool. This work addresses this task under a zero-shot setting, where only brief event-specific textural information (such as event names, a few descriptive sentences, etc.) is known yet none positive video example is provided. Mainstream approaches to tackling this task are middle-level semantic concept-based, where meticulously-crafted concept banks (e.g., LSCOM) are adopted. We argue that these concept banks are still inadequate facing video semantic complexity. Existing semantic concepts are essentially first-order, mainly designed for atomic objects, scenes or human actions, etc. This work advocates the utilization of high-order concepts (such as subject-predicate-object triplets or adjective-object). The main contributions are two-fold. First, we harvest a comprehensive albeit compact high-order concept library through distilling information from three large public datasets (MS-COCO, Visual Genome, and Kinetics-600), mainly related to visual relations and human-object interactions. Secondly, zero-shot events are often only briefly and partially described via textual input. The resultant semantic ambiguity makes the pursuit of the most indicative high-order concepts challenging. We thus design a novel query-expanding scheme that enriches ambiguous event-specific keywords by searching over either large common knowledge bases (e.g., WikiHow) or top-ranked webpages retrieved from modern search engines. This way sets up a more faithful connection between zero-shot events and high-order concepts. To our best knowledge, this is the first work that strives for concept-based video search beyond first-order concepts. Extensive experiments have been conducted on several large video benchmarks (TRECVID 2013, TRECVID 2014, and ActivityNet-1.3). The evaluations clearly demonstrate the superiority of our constructed high-order concept library and it- complementariness to existing concepts.

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations.

Multi-level Feature Representations for Video Semantic Concept Detection

Video Concept Detection Based on Multiple Features and Classifiers Fusion

Video Semantic Concept Detection Using Ontology

Video Semantic Concept Detection Using Multi-Modality Subspace Correlation Propagation

Robust Semantic Concept Detection in Large Video Collections

A Knowledge-Assisted Framework for Video Semantic Concept Detection

Hierarchical Latent Concept Discovery for Video Event Detection

Semantic Concept Detection for Video Based on Extreme Learning Machine

Efficient Heuristic Methods for Multimodal Fusion and Concept Fusion in Video Concept Detection

Video Semantic Concept Detection Based on Conceptual Correlation and Boosting

TRECVid 2013 Semantic Video Concept Detection by NTT-MD-DUT.

Design and Implementation of Semantic Concept Based Video Retrieval System

A Comprehensive Representation Scheme for Video Semantic Ontology and Its Applications in Semantic Concept Detection.

Exploiting Concept Association to Boost Multimedia Semantic Concept Detection

Analysis and Understanding for Multi-Level Video Semantic Concepts

Video diver: generic video indexing with diverse features.

Semantic Video Search by Exploiting Large-Scale Visual Concepts

Zero-Shot Video Event Detection With High-Order Semantic Concept Discovery and Matching

A Novel Framework for Concept Detection on Large Scale Video Database and Feature Pool

Non-rigid Video Object Segmentation Based on Semantic Multi-level Framework.