Score domain speaking rate normalization for speaker recognition

Rouzi AISIKAER,Dong WANG,Lantian LI,Fang ZHENG,Xiaodong ZHANG,Panshi IN
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2018.25.028
2018-01-01
Abstract:Speaking rate variations seriously degrade speaker recognition accuracy. This paper presents a normalization approach in the score domain that reduces the impact of speaking rate variations. The score distributions for each type of imposter in the cohort set (global and local sets which consist of speech utterances at different speaking rates) are computed against each enrolled speaker with the local cohort set obtained by splitting the utterances in the global cohort set according to the relative speaking rates. The scores for the test speech are normalized based on a self-recorded speaking rate database using a GMM-UBM (Gaussian mixture model-universal background model) framework with the data sparsity problem handled by augmenting the training data with a final relative EER (equal error rate) reduction of 33. 33%. This study shows that global and local score normalization methods effectively reduce the impact of speaking rate variations on speaker recognition.
What problem does this paper attempt to address?