Abstract:Speaker recognition suffers from serious performance degradation due to the spectrum distortion caused by different speaking rates. This paper proposes a score normalization approach to alleviate the impact of the speaking rate variety. In the global normalization algorithm, all speech utterances in different speaking rates are used as a cohort set, based on which the score distribution of imposters is computed for each enrolled speaker; in the local normalization algorithm, the utterances in the cohort set are split into several subsets according to the speaking rates, and the score distribution is estimated on each subset. During the test, the score of a test utterance against the claimed speaker is normalized based on the estimated score distribution of imposters of the claimed speaker. In order to focus on the speaking rate, a speech database named CSLT-SPRateDGT2016 was recorded, where utterances in fast, slow and normal speaking rates were collected intentionally. All the experiments were conducted with this database, based on the well-known GMM-UBM framework. The experimental results show that the global and local score normalization methods proposed in this papers provides 17.77% and 4.58% relative EER reduction. Furthermore, in order to solve the data scarcity problem, a data augmentation approach is prosed, by which utterances in different speaking rates are produced artificially by modifying the speaking rates of the original recordings. Experiments based on the augmented database show clearly performance gains, leading to 28.84% and 33.33% relative EER reduction, with the global and local normalization methods respectively.

Score domain speaking rate normalization for speaker recognition

Research on Score Domain Speaking Rate Normalization for Speaker Recognition

Score Regulation Based on GMM Token Ratio Similarity for Speaker Recognition

Scores Selection for Emotional Speaker Recognition

Score Normalization-Based Speaking-Style Variation Robust Speaker Recognition

Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition.

Score Normalization for Text-Independent Speaker Verification Systems

Universal background model reduction based efficient speaker recognition

Scoring Metrics of Assessing Voiceprint Distinctiveness Based on Speech Content and Rate

A Generative Model for Score Normalization in Speaker Recognition

A New Speaker Verification Method with GlobalSpeaker Model and Likelihood Score Normalization

A Simulation Study on Optimal Scores for Speaker Recognition

Using score normalization to solve the score variation problem in face authentication

A Principle Solution for Enroll-Test Mismatch in Speaker Recognition

Speaker Normalization Training and Adaptation for Speech Recognition

Score Normalization for Demographic Fairness in Face Recognition

Feature Transformation for Speaker Verification under Speaking Rate Mismatch Condition

Remarks on Optimal Scores for Speaker Recognition

Towards Robustness to Speech Rate in Mandarin All-Syllable Recognition

Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

Statistical Thresholding for Robust ASR