Abstract:One of the most important challenges in speaker recognition is intersession variability (ISV), primarily cross-channel effects. Recent NIST speaker recognition evaluations (SRE) include a multilingual scenario with training conversations involving multilingual speakers collected in a number of other languages, leading to further performance decline. One important reason for this is that more and more researchers are using phonetic clustering to introduce high level information to improve speaker recognition. But such language dependent methods do not work well in multilingual conditions. In this paper, we study both language and channel mismatch using a support vector machine (SVM) speaker recognition system. Maximum likelihood linear regression (MLLR) transforms adapting a universal background model (UBM) are adopted as features. We first introduce a novel language independent statistical binary-decision tree to reduce multi-language effects, and compare this data-driven approach with a traditional knowledge based one. We also construct a framework for channel compensation using feature-domain latent factor analysis (LFA) and MLLR supervector kernel-based nuisance attribute projection (NAP) in the model-domain. Results on the NIST SRE 2006 1conv4w-1conv4w/mic corpus show significant improvement. We also compare our compensated MLLR-SVM system with state-of-the-art cepstral Gaussian mixture and SVM systems, and combine them for a further improvement.

Maximum Likelihood Sub-band Linear Regression for Robust Speech Recognition

Linguistic Feedback Supports Rapid Adaptation to Acoustically Degraded Speech

Improving Online Incremental Speaker Adaptation with Eigen Feature Space MLLR.

Emotional speaker verification with linear adaptation

Model Adaptation for HMM-Based Speech Synthesis under Minimum Generation Error Criterion

Speaker adaptation using maximum likelihood model interpolation

Robust Speech Recognition Based on Spectral Adjusting and Warping

Model Adaptation Using the Projection to Latent Structure Algorithm

Replacing Uncertainty Decoding with Subband Re-Estimation for Large Vocabulary Speech Recognition in Noise.

Speaker Adaptation with MAP Estimation and Weighted Neighbor Regression

Multi-Channel Feature Adaptation for Robust Speech Recognition

A New Subspace Based Speaker Adaptation Method

Adapting noisy speech models — Extended uncertainty decoding

Speaker adaptation based on combination of MAP estimation and weighted neighbor regression

Robust Speech Recognition by Selecting Mel-Filter Banks

Research on Intersession Variability Compensation for MLLR-SVM Speaker Recognition.

MAP-based Speaker Adaptation in Speech Synthesis

Rapid Speaker Adaptation Using Multi-Stream Structural Maximum Likelihood Eigenspace Mapping

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.

Speaker Adaptation for Telephony Data Using Speaker Clustering