End-to-End Neural Speaker Diarization with Absolute Speaker Loss

Chao Wang,Jie Li,Xiang Fang,Jian Kang,Yongxiang Li
DOI: https://doi.org/10.21437/interspeech.2023-656
2023-01-01
Abstract:End-to-end neural speaker diarization (EEND) has proved to be a very promising method in speaker diarization, especially in tackling overlapping speech recordings. In this paper, we pro-pose a new approach to EEND that incorporates an absolute speaker loss function, which can force the network to consider global speaker identity information in the training phase, and keeps one-stage inference at the same time. Besides, we modify the pre-processing module and do not need feature splice, which results in longer contextual information and supports longer recording input when inferencing. As a result, with our proposed one-stage system, we achieve better results in simulated librispeech conversation-like data sets compared to EEND-VC, a two-stage system. We evaluate our experiments in different chunkings, different durations and different overlap ratios, and achieve up to 70% relative improvement in terms of DER over baseline EEND-VC on short recordings and up to 7.5% on long recordings.
What problem does this paper attempt to address?