Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition

Jing Deng,Thomas Fang Zheng,Zhanjiang Song,Jian Liu
DOI: https://doi.org/10.21437/interspeech.2005-636
2005-01-01
Abstract:The Gaussian mixture model-universal background model (GMM-UBM) has been dominant in text-independent speaker recognition tasks. However the conventional GMM-UBM method assumes that each Gaussian mixture is independent and ignores the fact that within Gaussian mixtures, there do exist some useful high-level speaker-dependent characteristics, such as word usage or speaking habits. Based on the GMM- UBM method, a method is proposed to use Gaussian mixture correlation to model the high-level information for speaker recognition tasks. In this method, we first cluster the Gaussian mixtures of the UBM into a small number of classes in terms of the mean vectors; in the following step, a universal class transition probability matrix (UCTPM) is learned which is helpful in modeling the high-level speaker's characteristics embedded in Gaussian mixture correlation. During the training phase, a speaker-dependent class transition probability matrix is adapted from the UCTPM. Experiments over two different databases show that an average 20.38% error rate reduction (ERR) can be achieved compared with the conventional GMM-UBM method.
What problem does this paper attempt to address?