Event Detection by Fusing Multimodal Objects Using HMM

张玉珍,丁思捷,王建宇,戴跃伟,陈钱
DOI: https://doi.org/10.16182/j.cnki.joss.2012.08.030
2012-01-01
Abstract:Automatic detection of semantic events in sport videos is a challenging task. There is multimodal semantic information in video, and fusing multimodal information can help computer accurately retrieve events needed by people. An efficient sports video event detection method by integrating multimodal objects based on hidden Markov model (HMM) is proposed. First, the audio stream is extracted from video and classified based on continuous HMM (CHMM). Then, according to time corresponding relationship, audio objects and video stream are fused together, and highlight events such as shoots, foul and general events can be detected by in the corresponding video shots fusing multimodal objects based on discrete HMM (DHMM). Among detected shoots, scoring event can be judged on the basis of caption appearance. In addition, structure, initialization and restriction for parameters of DHMM are detailed. Experiments prove the high efficiency of the proposed method.
What problem does this paper attempt to address?