Abstract:Event analysis in untrimmed videos has attracted increasing attention due to the application of cutting-edge techniques such as CNN. As a well studied property for CNN-based models, the receptive field is a measurement for measuring the spatial range covered by a single feature response, which is crucial in improving the image categorization accuracy. In video domain, video event semantics are actually described by complex interaction among different concepts, while their behaviors vary drastically from one video to another, leading to the difficulty in concept-based analytics for accurate event categorization. To model the concept behavior, we study temporal concept receptive field of concept-based event representation, which encodes the temporal occurrence pattern of different mid-level concepts. Accordingly, we introduce temporal dynamic convolution (TDC) to give stronger flexibility to concept-based event analytics. TDC can adjust the temporal concept receptive field size dynamically according to different inputs. Notably, a set of coefficients are learned to fuse the results of multiple convolutions with different kernel widths that provide various temporal concept receptive field sizes. Different coefficients can generate appropriate and accurate temporal concept receptive field size according to input videos and highlight crucial concepts. Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis. Experiment results on FCVID and ActivityNet show that TDCMN demonstrates adaptive event recognition ability conditioned on different inputs, and improve the event recognition performance of Concept-based methods by a large margin. Code is available at <a class="link-external link-https" href="https://github.com/qzhb/TDCMN" rel="external noopener nofollow">this https URL</a>.

Marginalized multi-layer multi-instance kernel for video concept detection

Multi-instance Kernel Learning with Concept Weights of Instance Space

A Unifying Multi-Label Temporal Kernel Machine with Its Application to Video Annotation

A Novel Video Object Tracking Approach Based on Kernel Density Estimation and Markov Random Field

Multi-kernel Multi-Label Learning with Max-Margin Concept Network

Markov Chain Local Binary Pattern And Its Application To Video Concept Detection

Online Kernel-Based Structured Output SVM for Early Expression Detection

Robust Video Identification Approach Based on Local Non-Negative Matrix Factorization

Active post-refined multimodality video semantic concept detection with tensor representation.

Per-sample Multiple Kernel Approach for Visual Concept Learning

Correlative Multilabel Video Annotation with Temporal Kernels

Multiple Hypergraph Ranking for Video Concept Detection

Tensor-based transductive learning for multimodality video semantic concept detection

Parallel Lasso for Large-Scale Video Concept Detection

Multi-Modality Transfer Based on Multi-Graph Optimization for Domain Adaptive Video Concept Annotation

Robust Semantic Concept Detection in Large Video Collections

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

Exploiting Generalized Discriminative Multiple Instance Learning for Multimedia Semantic Concept Detection

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Exploring Multi-Modality Structure for Cross Domain Adaptation in Video Concept Annotation

Video Semantic Concept Detection Using Multi-Modality Subspace Correlation Propagation