A Speaker Identification System for Video Content Analysis

Jing Bi,Shu-Chang Liu
DOI: https://doi.org/10.1109/IIH-MSP.2008.215
2009-01-01
Abstract:Recently, more literatures proposed to apply audio content analysis techniques in content-based video parsing. This paper presents our current works on a speaker identification system for video content analysis. The system is different from normal ones in the following aspects: firstly, soundtrack extracted from video stream includes not only silence and speech, but also music and environmental sound; secondly, the number of speakers in video content are uncertain; thirdly, the presence of noise in the video can significantly deteriorate system performance. According to these considerations, our speaker identification system involves such basic parts: audio classification and segmentation using rule and support vector machine (SVM) based classifier; speech clustering using spectral clustering technique and speaker identification based on Gaussian mixture model (GMM); speech enhancement based on spectral subtraction. Experiments are carried on a database extracted from news, conversation and movie videos. The obtained results confirm the validity of the proposed system architecture.
What problem does this paper attempt to address?