An Analysis of Speaker Diarization Fusion Methods for the First DIHARD Challenge

Bing Yin,Jun Du,Lei Sun,Xueyang Zhang,Shan He,Zhenhua Ling,Guoping Hu,Wu Guo
DOI: https://doi.org/10.23919/apsipa.2018.8659701
2018-01-01
Abstract:In this paper, we introduce the attempts of our fusion methods during the first DIHARD challenge. To our knowledge, this is the first launch in speaker diarization domain which aims to evaluate the performance of the state-of-the-art system in realistic adverse acoustic environments. Besides speech preprocessing modules including speech denoising and speech activity detection, our attention has been focused on back-end clustering algorithms, especially in system fusion. Consensus clustering is adopt to combine both original speech and denoised speech, for purifying unreliable clusters. Moreover, a score-level fusion is conducted between GMM-UBM-based i-vector and CNN-based i-vector. Finally, our system achieves diarization error rates ( DERs) of 36.05% on the evaluation sets, which is the second place in the DIHARD challenge.
What problem does this paper attempt to address?