A K-phoneme-class Based Multi-Model Method for Short Utterance Speaker Recognition

Chenhao Zhang,Xiaojun Wu,Thomas Fang Zheng,Linlin Wang,Cong Yin
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2013.06.017
2012-01-01
Abstract:For GMM-UBM based text-independent speaker recognition, the performance decreases significantly when the test speech is too short. Considering that the use of text information is helpful, a K-phoneme-class scoring based multiple phoneme class speaker model method (shortened as K-phoneme-class based multi-model method, abbreviated as KPCMMM) is proposed including a phoneme class speech recognition stage and a phoneme class dependent multi-model speaker recognition stage, where K means the number of most likely phoneme classes to be used in the second stage. Two different phoneme class definitions, expert-knowledge based and data-driven, are compared, and the performance as a function of K is also studied. Experimental results show that the data-driven phoneme class definition outperforms the expert-knowledge based one, and that an appropriate K value can lead to much better performance. Compared with the baseline GMM-UBM system, the proposed KPCMMM can achieve a relative equal error rate (EER) reduction of 38.60% for text-independent speaker recognition with a length of less than 2 seconds of test speech.
What problem does this paper attempt to address?