ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.

Mao-Kui He,Jun Du,Qing-Feng Liu,Chin-Hui Lee
DOI: https://doi.org/10.1109/taslp.2023.3265199
2023-01-01
Abstract:In this paper, we propose a neural speaker diarization (NSD) network architecture consisting of three key components. First, a memory-aware multi-speaker embedding (MA-MSE) mechanism is proposed to facilitate a dynamical refinement of speaker embedding to reduce a potential data mismatch between the speaker embedding extraction and the NSD network. Next, a speaker selection procedure is introduced to handle situations where the detected number of speakers is different from the assumed speaker size in the NSD network. Finally, an adaptive procedure is proposed to improve the required prior information for the nonoverlap speech segments in a given utterance during each iteration. We call our proposed framework adaptive neural speaker diarization with memory-aware multi-speaker embedding (ANSD-MA-MSE). Our method improves diarization performance in realistic operating scenarios, such as adverse acoustic environments, domain mismatches, and a varying, rather than fixed, number of speakers. Having been tested on both the AMI corpus and the DIHARD-III evaluation sets, our proposed approach consistently outperforms other state-of-the-art techniques in diarization error rates, including the results reported by the best single-model system in the DIHARD-III challenge.
What problem does this paper attempt to address?