Integration of Multimodal Features for Video Scene Classification Based on HMM

Jincheng Huang,Zhu Liu,Yao Wang,Yu Chen,Edward K. Wong
DOI: https://doi.org/10.1109/mmsp.1999.793797
1999-01-01
Abstract:Along with the advances in multimedia and Internet technology, a huge amount of data, including digital video and audio, are generated daily. Tools for the efficient indexing and retrieval of such data are indispensable. With multi-modal information present in the data, effective integration is necessary and is still a challenging problem. In this paper, we present four different methods for integrating audio and visual information for video classification based on a hidden Markov model (HMM): direct concatenation, product HMM, two-stage HMM, and integration by neural network. Our results have shown significant improvements over using a single modality.
What problem does this paper attempt to address?