Abstract:Event analysis in untrimmed videos has attracted increasing attention due to the application of cutting-edge techniques such as CNN. As a well studied property for CNN-based models, the receptive field is a measurement for measuring the spatial range covered by a single feature response, which is crucial in improving the image categorization accuracy. In video domain, video event semantics are actually described by complex interaction among different concepts, while their behaviors vary drastically from one video to another, leading to the difficulty in concept-based analytics for accurate event categorization. To model the concept behavior, we study temporal concept receptive field of concept-based event representation, which encodes the temporal occurrence pattern of different mid-level concepts. Accordingly, we introduce temporal dynamic convolution (TDC) to give stronger flexibility to concept-based event analytics. TDC can adjust the temporal concept receptive field size dynamically according to different inputs. Notably, a set of coefficients are learned to fuse the results of multiple convolutions with different kernel widths that provide various temporal concept receptive field sizes. Different coefficients can generate appropriate and accurate temporal concept receptive field size according to input videos and highlight crucial concepts. Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis. Experiment results on FCVID and ActivityNet show that TDCMN demonstrates adaptive event recognition ability conditioned on different inputs, and improve the event recognition performance of Concept-based methods by a large margin. Code is available at <a class="link-external link-https" href="https://github.com/qzhb/TDCMN" rel="external noopener nofollow">this https URL</a>.

TCM: Temporal Consistency Model for Head Detection in Complex Videos

Real-time Follow-Up Head Tracking in Dynamic Complex Environments

Attention-guided Temporally Coherent Video Object Matting

TempT: Temporal consistency for Test-time adaptation

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Human Head Pose Estimation Through Temporal Enhanced and Accurate Self-Supervised Depth Prediction

Temporally Identity-Aware SSD With Attentional LSTM

Alignment-guided Temporal Attention for Video Action Recognition

Multi-Scale Temporal Relations and Segmented Channel Attention for Video Anomaly Detection

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Campus Abnormal Behavior Recognition With Temporal Segment Transformers

TAM: Temporal Adaptive Module for Video Recognition

TSI: Temporal Saliency Integration for Video Action Recognition

Temporal-Spatial Mapping for Action Recognition

MCMNET: Multi-Scale Context Modeling Network for Temporal Action Detection

TMAV: Temporal Motionless Analysis of Video using CNN in MPSoC

Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking

Machine Learning for Single and Complex 3D Head Gestures: Classification in Human-Computer Interaction

TKD: Temporal Knowledge Distillation for Active Perception

Intrusion Detection Model Using Temporal Convolutional Network Blend Into Attention Mechanism

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications