Abstract:Meaningful representation and effective retrieval of video shots in a large-scale database has been a profound challenge for the image/video processing and computer vision communities. A great deal of effort has been devoted to the extraction of low-level visual features, such as color, shape, texture, and motion for characterizing and retrieving video shots. However, the accuracy of these feature descriptors is still far from satisfaction due to the well-known semantic gap. In order to alleviate the problem, this paper investigates a novel methodology of representing and retrieving video shots using human-centric high-level features derived in brain imaging space (BIS) where brain responses to natural stimulus of video watching can be explored and interpreted. At first, our recently developed dense individualized and common connectivity-based cortical landmarks (DICCCOL) system is employed to locate large-scale functional brain networks and their regions of interests (ROIs) that are involved in the comprehension of video stimulus. Then, functional connectivities between various functional ROI pairs are utilized as BIS features to characterize the brain's comprehension of video semantics. Then an effective feature selection procedure is applied to learn the most relevant features while removing redundancy, which results in the formation of the final BIS features. Afterwards, a mapping from low-level visual features to high-level semantic features in the BIS is built via the Gaussian process regression (GPR) algorithm, and a manifold structure is then inferred, in which video key frames are represented by the mapped feature vectors in the BIS. Finally, the manifold-ranking algorithm concerning the relationship among all data is applied to measure the similarity between key frames of video shots. Experimental results on the TRECVID 2005 dataset demonstrate the superiority of the proposed work in comparison with traditional methods.

A Compact Shot Representation for Video Semantic Indexing

A Novel Compact Yet Rich Key Frame Creation Method for Compressed Video Summarization

New Fusional Framework Combining Sparse Selection and Clustering for Key Frame Extraction.

Video Content Representation for Shot Retrieval and Scene Extraction.

A Unified Framework for Semantic Shot Representation of Sports Video

Key frame vector and its application to shot retrieval

Shot Content Analysis for Video Retrieval Applications

Efficient Semantic Video Segmentation with Per-Frame Inference

A New Hierarchical Key Frame Tree-Based Video Representation Method Using Independent Component Analysis

Condensing a Sequence to One Informative Frame for Video Recognition

An Improved Video Identification Scheme Based on Video Tomography.

Global Motion Representation of Video Shot Based on Vector Quantization Index Histogram

An Efficient Approach Based on Image Pixel and Semantic Features Towards Video Retrieval

A lightweight weak semantic framework for cinematographic shot classification

Video diver: generic video indexing with diverse features.

Method Based On Temporal Constrain Shot Method Based On Temporal Constrain Shot Similarity.

Video Scene Extraction by Force Competition

Automatic Moving Object Extraction toward Content-Based Video Representation and Indexing

Representing and retrieving video shots in human-centric brain imaging space.

OCSampler: Compressing Videos to One Clip with Single-step Sampling

Semantic-Aware Visual Decomposition for Image Coding