Abstract:Joint factor analysis (JFA) is a recently developed method to model speaker and session variability in Gaussian Mixture Models (GMMs). In this paper, both batch and sequential Bayesian analysis of JFA models are evaluated for robust speaker recognition. Various sources of uncertainties in JFA models, from latent speaker and channel factors to Gaussian mixture indicator variables, are examined from a Bayesian perspective. By integrating over all these latent factors, we could better account for the sources of variability in speaker enrollment and verification processes than considering only point estimates; through this study, we could also analyze and identify the contribution of these various underlying model uncertainties to the final speaker verification performance. However, as all latent variables in JFA GMM become correlated with each other given observed data, it becomes practically intractable to do Bayesian analysis in closed analytic form. Hence, an alternative approach based on variational Bayes is developed in this paper to explore Bayesian JFA models in an approximate yet efficient way. In this method, fully correlated a posteriori distribution is approximated by a variational distribution of factored form to facilitate inference; and a lower bound on model likelihood is also derived to construct detection scores. Experimental results on the 2008 NIST Speaker Recognition Evaluation (NIST SRE) show that these variational Bayesian JFA models could obtain significant performance improvements over JFA using point estimates, especially for the cases with limited enrollment and test data. For the 10-s task in the 2008 NIST SRE, the variational Bayesian JFA systems obtained relatively 9.4% EER and 11.5% DCF reductions compared to the baseline JFA system. This paper also shows the importance of taking into account the uncertainties in both speaker and channel factors, which is more effective than considering uncertainties in channel factors alone.

Model Session Variability in Speaker Verification

Simplified factor analysis in speaker verification

Speaker Verification Based on Factor Analysis and SVM

A PCA Method Based on Speaker Session Variability

Compensation of Intrinsic Variability with Factor Analysis Modeling for Robust Speaker Verification

Total Variability Factors Combination for Speaker Verification

Applying Emotional Factor Analysis And I-Vector To Emotional Speaker Recognition

Affect-Insensitive Speaker Recognition by Feature Variety Training

VarASV: Enabling Pitch-variable Automatic Speaker Verification Via Multi-task Learning

Factor Analysis in GMM-Based Language Identification

Factor analysis based spatial correlation modeling for speaker verification

Factor Analysis and Space Assembling in Speaker Recognition

Session Variability Subspace Projection Based Model Compensation for Speaker Verification

Factored covariance modeling for text-independent speaker verification

Variational Bayesian Joint Factor Analysis Models for Speaker Verification

Eigenvoice Factor Analysis in Short Time Speaker Recognition

Eigenchannel Compensation and Symmetric Score for Robust Text-Independent Speaker Verification

Using MMSE to Improve Session Variability Estimation

An SIPCA-WCCN method for SVM-based speaker verification system

Local Variability Modeling for Text-Independent Speaker Verification.

Exploration of Local Variability in Text-Independent Speaker Verification