A Compact Shot Representation for Video Semantic Indexing

Jinzhuo Wang,Wenmin Wang,Ronggang Wang,Wen Gao
DOI: https://doi.org/10.1109/icip.2015.7351407
2015-01-01
Abstract:This paper presents a compact shot representation for video semantic indexing (SIN). The proposed representation consists of visual cues from only two frames, i.e., key frame (KF) and difference frame (DF), which are both constructed with spatial pyramid. The KF describes static information while the generated DF captures non-static information. Each region of DF is derived from the same location in a selected frame, which has the most salient difference compared with the key frame in that region. We introduce a variation of DF to further enhance our model. Experimental results on TRECVID SIN demonstrate that our method obtains better accuracy than the state-of-the-art, while requiring less storage space and consuming time.
What problem does this paper attempt to address?