Multi-modality Video Shot Clustering with Tensor Representation

Yanan Liu,Fei Wu
DOI: https://doi.org/10.1007/s11042-008-0220-5
IF: 2.577
2008-01-01
Multimedia Tools and Applications
Abstract:Video analysis and understanding is a challenging issue nowadays. Video data has multiple media modalities, which present a characteristic of temporal-sequenced associated cooccurrence (TSAC). Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then applied to these vectors in a high dimensional space for dimensionality reduction, classification, clustering and recognition as well. However, the multiple modalities in video not only have their own properties, but also have correlations between them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. Clustering is an important technique for multimedia data management. Recently, a powerful clustering algorithm named Affinity Propagation is devised. In this paper, we introduce a higher-order tensor framework for video analysis. In this framework, we represent image frame, audio stream and transcript text which are the three modalities in video shots as data points by the third-order tensor. Besides, we present a dimension reduction method for the high-dimensional features of video shots which explicitly considers the manifold structure of the tensor space from temporal-sequenced associated co-occurring multimodal media data. We call it TensorShot approach. Then we utilize the effective Affinity Propagation to cluster video shots that are in tensor form. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled. The experiments on TRECVID2005 news video data set show that our algorithm achieves improved performance.
What problem does this paper attempt to address?