Speaker Clustering of Telephone Speech Based on Front-End Factor Analysis

Kui WU,Yan SONG,Li-Rong DAI
DOI: https://doi.org/10.3969/j.issn.1003-6059.2013.01.001
2013-01-01
Abstract:The existing speaker clustering methods based on Gaussian mixture model (GMM) mainly obtain clusters' GMMs by adapting from universal background model (UBM). However, this adaptive method suffers from the lack of data and results in poor models. In this paper, two factor analysis modeling methods based on eigenvoice (EV) space analysis and total variability (TV) space analysis respectively are explored. The two methods greatly reduce the number of estimated parameters when clusters' GMMs are estimated by modeling variability space. The experimental results on two speakers telephone data in 2008 NIST Speaker Recognition Evaluation show that the two proposed methods achieve considerable reduction in speaker error rate compared to the baseline system using MAP adaptation, and the method based on TV space analysis obtains lower speaker error rate compared to the method based on EV space analysis.
What problem does this paper attempt to address?