Research on Score Domain Speaking Rate Normalization for Speaker Recognition

AISIKAER Rouzi,WANG Dong,LI Lantian,Thomas Fang Zheng,Xiaodong Zhang,Panshi,Jin
2017-01-01
Abstract:Speaker recognition suffers from serious performance degradation due to the spectrum distortion caused by different speaking rates. This paper proposes a score normalization approach to alleviate the impact of the speaking rate variety. In the global normalization algorithm, all speech utterances in different speaking rates are used as a cohort set, based on which the score distribution of imposters is computed for each enrolled speaker; in the local normalization algorithm, the utterances in the cohort set are split into several subsets according to the speaking rates, and the score distribution is estimated on each subset. During the test, the score of a test utterance against the claimed speaker is normalized based on the estimated score distribution of imposters of the claimed speaker. In order to focus on the speaking rate, a speech database named CSLT-SPRateDGT2016 was recorded, where utterances in fast, slow and normal speaking rates were collected intentionally. All the experiments were conducted with this database, based on the well-known GMM-UBM framework. The experimental results show that the global and local score normalization methods proposed in this papers provides 17.77% and 4.58% relative EER reduction. Furthermore, in order to solve the data scarcity problem, a data augmentation approach is prosed, by which utterances in different speaking rates are produced artificially by modifying the speaking rates of the original recordings. Experiments based on the augmented database show clearly performance gains, leading to 28.84% and 33.33% relative EER reduction, with the global and local normalization methods respectively.
What problem does this paper attempt to address?