Using Phoneme Recognition and Text-Dependent Speaker Verification to Improve Speaker Segmentation for Chinese Speech.

Gang Wang,Xiaojun Wu,Thomas Fang Zheng
DOI: https://doi.org/10.21437/interspeech.2010-148
2010-01-01
Abstract:Speaker segmentation is widely used in many tasks such as multi-speaker detection and speaker tracking. The segmentation performance depends on the performance of speaker verification (SV) between two short utterances to a large extent, so the improvement of the SV performance for short utterances would give the segmentation performance a great help. In this paper, a method based on phoneme recognition and text-dependent speaker recognition is proposed. During segmentation, a phoneme sequence is first recognized using a phoneme, recognizer and then text-dependent speaker recognition based on dynamic time warping (DTW) is performed on the same phoneme in two adjacent windows. Experiments over Chinese Corpus Consortium (CCC) MSS database showed that better performance was achieved compared with the BIC method and the GLR method.
What problem does this paper attempt to address?