Audio-Visual System for Robust Speaker Recognition.

Q Chen,JG Yang,J Gou
2005-01-01
Abstract:Automatic speaker recognition systems solely based on acoustic signal degrades considerably v as back-ground changes. However, the shape and motion of speaker's lip can help improving speech perception. Illuminated by this, in a speaker recognition system, lip information extracted from visual scenes is considered to work together with acoustic features. In this paper, we detect human face from video frame with two steps. Support Vector Machine model is used to retrieve static and dynamic shape features of lip. A late fusion module is investigated to identify speaker's identity in which the results of two independent subsystems of visual GMM model and acoustic GMM model are combined. Experiment shows that the audio-visual technique outperforms the use of video or audio data alone with clean speech conditions or degrade speech conditions.
What problem does this paper attempt to address?