Abstract:The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, behaviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes, or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of similar layout) is not generally exploited to improve any automated surveillance tasks and reduce manual effort. Exploiting commonality, and sharing any supervised annotations, between different scenes is however challenging due to: Some scenes are totally un-related -- and thus any information sharing between them would be detrimental; while others may only share a subset of common activities -- and thus information sharing is only useful if it is selective. Moreover, semantically similar activities which should be modelled together and shared across scenes may have quite different pixel-level appearance in each scene. To address these issues we develop a new framework for distributed multiple-scene global understanding that clusters surveillance scenes by their ability to explain each other's behaviours; and further discovers which subset of activities are shared versus scene-specific within each cluster. We show how to use this structured representation of multiple scenes to improve common surveillance tasks including scene activity understanding, cross-scene query-by-example, behaviour classification with reduced supervised labelling requirements, and video summarization. In each case we demonstrate how our multi-scene model improves on a collection of standard single scene models and a flat model of all scenes.

Video scene segmentation and semantic representation using a novel scheme

Scene Segmentation Based on Video Structure and Spectral Methods

Description and Browsing of Video Story Structure

Video Content Representation for Shot Retrieval and Scene Extraction.

Spatio-Temporal Video Segmentation of Static Scenes and Its Applications

Non-rigid Video Object Segmentation Based on Semantic Multi-level Framework.

Optimized Video Scene Segmentation.

Video Scene Segmentation Using Sequential Change Detection

Video abstraction based on the visual attention model and online clustering

Contour Based Automatic Scene Segmentation in Image Sequences

Automatic video scene segmentation based on spatial-temporal clues and rhythm

Video News Indexing Using Semantic-Face

An Efficient Scene Detection Using Rough Set-Based Fuzzy Clustering for Film Video

A novel video abstraction method based on fast clustering of the regions of interest in key frames

Improving Semantic Scene Categorization by Exploiting Audio-Visual Features

Spatio-Temporal Segmentation with Depth-Inferred Videos of Static Scenes

Video Scene Extraction by Force Competition

Scene Segmentation and Categorization Using NCuts

Efficient shot cluster-based scene detection method for movie video retrieval

Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization

User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction