A Deep Analysis of Speech Separation Guided Diarization Under Realistic Conditions
Xin Fang,Zhen-Hua Ling,Lei Sun,Shu-Tong Niu,Jun Du,Cong Liu,Zhi-Chao Sheng
2021-01-01
Abstract:Recently, with the development of voice-print technology, such as x-vectors, the performance of speaker diarization has made a great progress. However, the allocation of overlapping speech segments is still a difficult problem. At the same time, great results have been achieved in the field of speech separation, especially the end-to-end time-domain audio separation network (TasNet). Speaker diarization and speech separation have strong similarities in task definition, one is to give the existence of each speaker in the time dimension, and the other is to give separated speech signals of different speakers. In this paper, we take advantage of the complementarity between the two tasks and propose a speech separation guided diarization (SSGD) approach. To our knowledge, this is the first deep analysis about combining both speaker diarization and speech separation methods. Moreover, we compare the architectures of various common speech separation models, and analyze the robustness and generalization ability of the proposed method. By incorporating this method, the overall system achieved the first place among all submitted systems in the DIHARD-III challenge.