Abstract:Speaker recognition suffers from serious performance degradation due to the spectrum distortion caused by different speaking rates. This paper proposes a score normalization approach to alleviate the impact of the speaking rate variety. In the global normalization algorithm, all speech utterances in different speaking rates are used as a cohort set, based on which the score distribution of imposters is computed for each enrolled speaker; in the local normalization algorithm, the utterances in the cohort set are split into several subsets according to the speaking rates, and the score distribution is estimated on each subset. During the test, the score of a test utterance against the claimed speaker is normalized based on the estimated score distribution of imposters of the claimed speaker. In order to focus on the speaking rate, a speech database named CSLT-SPRateDGT2016 was recorded, where utterances in fast, slow and normal speaking rates were collected intentionally. All the experiments were conducted with this database, based on the well-known GMM-UBM framework. The experimental results show that the global and local score normalization methods proposed in this papers provides 17.77% and 4.58% relative EER reduction. Furthermore, in order to solve the data scarcity problem, a data augmentation approach is prosed, by which utterances in different speaking rates are produced artificially by modifying the speaking rates of the original recordings. Experiments based on the augmented database show clearly performance gains, leading to 28.84% and 33.33% relative EER reduction, with the global and local normalization methods respectively.

The speaking rate adaptation algorithm in Putonghua continuous speech recognition

Improved algorithm with duration information for continuous speech recognition

A study of duration in continuous speech recognition based on DDBHMM

Towards Robustness to Speech Rate in Mandarin All-Syllable Recognition

Continuous Speech Recognition Based on the Triphone DDBHMM

Algorithm for Mandarin Continuous Speech Recognition Based on Context-Dependent Unit Between Syllables

An inhomogeneous HMM speech recognition algorithm

Adaptive Compensation Algorithm in Open Vocabulary Mandarin Speaker-Independent Speech Recognition

A Fast Error-tolerant Algorithm in Decoding Module of Speech Recognition

An Efficient Computation Algorithm In Mandarin Continuous Speech Recognition

One-stage Search Algorithm for Large Vocabulary Continuous Speech Recognition Based on DDBHMM

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Continuous speech recognition method and continuous speech recognition system

Research on Score Domain Speaking Rate Normalization for Speaker Recognition

Speaking Rate Normalization With Lattice-Based Context-Dependent Phoneme Duration Modeling For Personalized Speech Recognizers On Mobile Devices

Speech Recognition Using Speaker Adaptation by System Parameter Transformation.

Application of dynamic time warping optimization algorithm in speech recognition of machine translation

Speaker Adaptation with MAP Estimation and Weighted Neighbor Regression

A New Topic-Based Language Model Adaptation

Probabilistic Speaker-Class Based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition

A Speaker Adaptation Algorithm Using Principal Curves in Noisy Environments