Abstract:Joint factor analysis (JFA) is a recently developed method to model speaker and session variability in Gaussian Mixture Models (GMMs). In this paper, both batch and sequential Bayesian analysis of JFA models are evaluated for robust speaker recognition. Various sources of uncertainties in JFA models, from latent speaker and channel factors to Gaussian mixture indicator variables, are examined from a Bayesian perspective. By integrating over all these latent factors, we could better account for the sources of variability in speaker enrollment and verification processes than considering only point estimates; through this study, we could also analyze and identify the contribution of these various underlying model uncertainties to the final speaker verification performance. However, as all latent variables in JFA GMM become correlated with each other given observed data, it becomes practically intractable to do Bayesian analysis in closed analytic form. Hence, an alternative approach based on variational Bayes is developed in this paper to explore Bayesian JFA models in an approximate yet efficient way. In this method, fully correlated a posteriori distribution is approximated by a variational distribution of factored form to facilitate inference; and a lower bound on model likelihood is also derived to construct detection scores. Experimental results on the 2008 NIST Speaker Recognition Evaluation (NIST SRE) show that these variational Bayesian JFA models could obtain significant performance improvements over JFA using point estimates, especially for the cases with limited enrollment and test data. For the 10-s task in the 2008 NIST SRE, the variational Bayesian JFA systems obtained relatively 9.4% EER and 11.5% DCF reductions compared to the baseline JFA system. This paper also shows the importance of taking into account the uncertainties in both speaker and channel factors, which is more effective than considering uncertainties in channel factors alone.

Speaker identification under mismatched speaking manner based on joint factor analysis

A Joint Factor Analysis Approach to Whispering Speaker Identification under Mismatched Speaking Manners and Channels

Joint Factor Analysis of Channel Mismatch in Whispering Speaker Verification

Whispered Speaker Identification Based on Factor Analysis and Feature Mapping

Whispered Speech Speaker Identification Based on SVM and FA

A study on speaker and session variability in speaker recognition of Chinese whispered speech

Whispered Speaker Identification Based on Feature and Model Hybrid Compensation

Applying Emotional Factor Analysis And I-Vector To Emotional Speaker Recognition

Learning Virtual HD Model for Bi-model Emotional Speaker Recognition

On the use of phase information-based joint factor analysis for speaker verification under channel mismatch condition

Whispering Speaker Identification

Speaker Identification with Whispered Speech Using Unvoiced-Consonant Phonemes

Speaker Identification with Distant Microphone Speech.

Variational Bayesian Joint Factor Analysis Models for Speaker Verification

Factor Analysis and Space Assembling in Speaker Recognition

Mismatched Feature Detection with Finer Granularity for Emotional Speaker Recognition.

Orthogonal subspace combination based on the joint factor analysis for text-independent speaker recognition

Eigenchannel Space Combination Method of Joint Factor Analysis

Robust Far-Field Speaker Identification under Mismatched Conditions

Robust Analysis And Weighting On Mfcc Components For Speech Recognition And Speaker Identification

Speaker Identification Using Warped MVDR Cepstral Features