Multi-speaker Segmentation and Clustering of Telephone Speech

ZHANG Wei,LIU Jia
DOI: https://doi.org/10.3321/j.issn:1000-0054.2008.04.032
2008-01-01
Abstract:Multi-speaker segmentation and clustering of telephone speech was used to improve the quality of extracted single speaker speech. A segmentation-clustering-resegmentation scheme was developed to improve the performance of each step. The segmentation algorithms compares different distance metrics with a refinery scheme based on the Bayesian information criteria (BIC) algorithm to fuse the segmentation-point results. The clustering uses a hierarchical clustering algorithm which combines BIC and cross likelihood ratio (CLR) metrics the resegmentation step uses an evolutionary hidden Markov model (EHMM) to refine the segmentation result. Tests on the national institute of standards and technology (NIST) 1998 multi-speaker corpus give an overall enhancement as indicated by the cluster purity system performance indicator of 10%.
What problem does this paper attempt to address?