Abstract:This paper proposed a systematical framework to address the essential problem, the semantic gap between extractable lowlevel features and meaningful high-level semantics, in content-based retrieval. Low-level features, which can be directly extracted from video streams, are color histogram, inter-frame differences, edges, etc. Theoretically, it is possible to detect events from these features based on hidden Markov models (HMMs) or dynamical Bayesian networks (DBNs), but, in practice, the models are too complicate to be built and to be trained. This paper proposed to employ cues as a middle stone to bridge the gap between the low-level features and the highlevel events. Cues have two salient characteristics: they hold causality with the events, and they can be deduced from features or extracted from video streams. Based on this idea, a systematical framework is constructed to analysis soccer videos, which is selected as a test bed for the fact that events (i.e. shoot event, foul event, and normal process) can be clearly defined in soccer game. First of all, the input video stream is segmented to shots based on the features directly extracted from videos; secondly, se-mantic cues, such as slow motion replay, face of player, caption, goalmouth, shot frequency, etc., are deduced or extracted from the shots; thirdly, three HMMs are built and trained to infer the three events from cues. In general video streams contain more than one event, thus an unavoidable problem is shots should be group to sets, in which there is only one event, for HMM-based events inference. In other words, shots should be appropriately grouped into sets of shots, so that the input observation sequences (a set of shots) fed into HMMs fit at least one of the models. Due to this self-enwound control structure, a hierarchical HMM (HHMM) is employed to group shots and to recognize events simultaneously in video stream. The experiments show the system is effective and robust in inferring events from roughly deduced or extracted cues.

Event Detection by Fusing Multimodal Objects Using HMM

A Hmm Based Semantic Analysis Framework For Sports Game Event Detection

A Fusion Scheme of Visual and Auditory Modalities for Event Detection in Sports Video.

Cues extraction and hierarchical HMM based events inference in soccer video

Modality Mixture Projections for Semantic Video Event Detection

Motion Based Event Recognition Using HMM

Event Detection In Basketball Video Using Multiple Modalities

Semantic event detection via multimodal data mining

Hidden Markov Model Based Events Detection In Soccer Video

Multi-Mode Semantic Cues Based on Hidden Conditional Random Field in Soccer Video

An HMM-based framework for video semantic analysis

Complex Video Event Detection Via Pairwise Fusion of Trajectory and Multi-Label Hypergraphs

Semantic Event Extraction From Basketball Games Using Multi-Modal Analysis

Integration of Multimodal Features for Video Scene Classification Based on HMM

Video Event Detection using ICA Mixture Hidden Markov Models

MULTIMEDIA SCENE DETECTION BASED ON FUSION OF BI-MODAL FEATURES

Multimodal feature fusion for robust event detection in web videos

Discovering Joint Audio–visual Codewords for Video Event Detection

Event Analysis in Soccer Video by Dynamic Programming Based Fusion of Multiple Modalities

Multi-scale Harmonic Mean Time Surfaces for Event-based Object Classification

Audio Content-based Highlight Detection Using Adaptive Hidden Markov Model