GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition Challenge (VoxSRC) 2023

Dongkeon Park,Ji Won Kim,Kang Ryeol Kim,Do Hyun Lee,Hong Kook Kim
2023-08-25
Abstract:This report describes the submission system by the GIST-AiTeR team for the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system focuses on implementing diverse speaker diarization (SD) techniques, including ResNet293 and MFA-Conformer with different combinations of segment and hop length. Then, those models are combined into an ensemble model. The ResNet293 and MFA-Conformer models exhibited the diarization error rates (DERs) of 3.65% and 3.83% on VAL46, respectively. The submitted ensemble model provided a DER of 3.50% on VAL46, and consequently, it achieved a DER of 4.88% on the VoxSRC-23 test set.
Audio and Speech Processing
What problem does this paper attempt to address?