An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments

Meng-Zhen Li,Xiao-Lei Zhang
DOI: https://doi.org/10.23919/apsipa.2018.8659665
2018-01-01
Abstract:Speaker clustering is an important problem of speech processing, such as speaker diarization, however, its behavior in adverse acoustic environments is lack of comprehensive study. To address this problem, we focus on investigating its components respectively. A speaker clustering system contains three components-a feature extraction front-end, a dimensionality reduction algorithm, and a clustering back-end. In this paper, we use the standard Gaussian mixture model based universal background model ( GMM-UBM) as a front end to extract high-dimensional supervectors, and compare three dimensionality reduction algorithms as well as two clustering algorithms. The three dimensionality reduction algorithms are the principal component analysis ( PCA), spectral clustering ( SC), and multilayer bootstrap network ( MBN). The two clustering algorithms are the k-means and agglomerative hierarchical clustering ( AHC). We have conducted an extensive experiment with both in-domain and out-of-domain settings on the noisy versions of the NIST 2006 speaker recognition evaluation ( SRE) and NIST 2008 SRE corpora. Experimental results in various noisy environments show that ( i) the MBN based systems perform the best in most cases, while the SC based systems outperform the PCA based systems as well as the original supervector based systems; ( ii) AHC is more robust than k-means.
What problem does this paper attempt to address?