Abstract:In this paper we present our system for the task 2 of the Short-duration Speaker Verification (SdSV) Challenge 2021. This task focuses on benchmarking and varying degrees of phonetic variability analysis of short-duration speaker recognition system. The main difficulty exists in the variance between cross-lingual trials, along with the limited in-domain Farsi training data. Based on the state-of-the-art ResNetSE speaker embedding network, we propose a novel network architecture with in-domain data finetuning and novel scoring methods, and achieve significant improvement over the ResNetSE baselines. Furthermore, score calibration on duration efficiently improve the robustness. Finally, our system with fusion of 10 subsystems achieve satisfying results in MinDCF and EER of 0.0394 and 0.84% respectively on the SdSVC evaluation set.

The Sogou System for Short-duration Speaker Verification Challenge 2021