Improving Separation-Based Speaker Diarization Via Iterative Model Refinement And Speaker Embedding Based Post-Processing

Shu-Tong Niu,Jun Du,Lei Sun,Chin-Hui Lee
DOI: https://doi.org/10.1109/icassp43922.2022.9746354
2022-05-23
Abstract:In this paper, we propose an iterative separation-based speaker diarization (ISSD) approach to cope with the realistic data conditions. In the proposed ISSD, we iteratively generate adaptation data ac-cording to speaker priors and fine-tune the separation model, which leads to a gradual performance improvement. To further reduce some unavoidable speaker detection errors due to some undesirable prior errors using simple ISSD, we utilize speaker embedding information and propose two post-processing techniques, namely, speaker filtering and speaker recovery. We evaluate the diarization performance on the two-speaker conversational telephone speech (CTS) data set from DIHARD-III Challenge. When compared to state-of-the-art clustering-based speaker diarization (CSD) system, the proposed ISSD approach combined with the two post-processing schemes yields a 47.72 % and 46.97 % relative diarization error rate reduction on the development and evaluation sets, respectively. ISSD is also one key contributing factor to the best-performing system in DIHARD-III Challenge.
What problem does this paper attempt to address?