A pitch-based rapid speech segmentation for speaker indexing

Min Yang,Yingchun Yang,Zhaohui Wu
DOI: https://doi.org/10.1109/ISM.2005.17
2005-01-01
Abstract:Segmentation of continuous audio is an important processing in many applications. In speaker indexing, the reliability of speaker model depends much on segmentation. Commonly used methods are based on the Bayesian information criteria (BIC), which is however not so capable when dealing with short utterances. In this paper, we present a pitch-based speech segmentation method, which can detect frequent speaker changes accurately and rapidly. In our algorithm, pitch is introduced in speaker segmentation. Firstly, utterance segments are detected by pitch. Then distances of pitch are computed, and compared with a self-adaptable threshold. Speaker changes are finally decided among utterance segments. We applied our method and three comparative methods on the HUB4-NE broadcast data. Speaker indexing experiments have been taken following each algorithm. We also suggested two indicators as complements of false alarm and missing rate in the evaluation of segmentation. The experiment results show that our algorithm works faster and better, with most of short time speaker changes detected. Speaker indexing equal error rate of our method is 10.43%, which is much lower than 12.94%, 25.84% and 15.91% of other methods.
What problem does this paper attempt to address?