Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge

Oğuzhan Kurnaz,Selim Can Demirtaş,Aykut Büker,Jagabandhu Mishra,Cemal Hanilçi
2024-08-28
Abstract:This paper introduces the parallel network-based spoofing-aware speaker verification (SASV) system developed by BTU Speech Group for the ASVspoof5 Challenge. The SASV system integrates ASV and CM systems to enhance security against spoofing attacks. Our approach employs score and embedding fusion from ASV models (ECAPA-TDNN, WavLM) and CM models (AASIST). The fused embeddings are processed using a simple DNN structure, optimizing model performance with a combination of recently proposed a-DCF and BCE losses. We introduce a novel parallel network structure where two identical DNNs, fed with different inputs, independently process embeddings and produce SASV scores. The final SASV probability is derived by averaging these scores, enhancing robustness and accuracy. Experimental results demonstrate that the proposed parallel DNN structure outperforms traditional single DNN methods, offering a more reliable and secure speaker verification system against spoofing attacks.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
This paper aims to solve the security problem of Automatic Speaker Verification (ASV) systems when facing spoofing attacks. Specifically, the paper proposes a new parallel network structure to enhance the anti - spoofing ability of the speaker verification system. By combining the ASV (Automatic Speaker Verification) and CM (Countermeasure) systems, this method aims to improve the robustness and accuracy of the system against spoofing attacks. ### Main contributions of the paper 1. **Parallel network structure**: The paper proposes a parallel structure containing two identical but independently working DNNs (Deep Neural Networks). Each DNN processes the embedding vectors from the ASV and CM systems respectively and generates SASV (Spoof - aware Speaker Verification) scores. The final SASV probability is obtained by averaging the outputs of these two DNNs. 2. **Optimized loss function**: To optimize the parallel network, the paper adopts a combination of a - DCF (weighted cost function) and BCE (Binary Cross - Entropy) loss functions. This combined loss function can balance the detection cost and classification accuracy, thereby improving the overall performance of the model. 3. **Experimental verification**: By conducting experiments on the dataset of the ASVspoof5 challenge, the paper shows the superior performance of the proposed parallel network structure on multiple metrics, especially significantly outperforming the traditional single - DNN method on the a - DCF metric. ### Problems solved - **Detection of spoofing attacks**: Traditional ASV systems are vulnerable to spoofing attacks, leading to unauthorized access and security vulnerabilities. The paper improves the system's ability to detect spoofing attacks by integrating the ASV and CM systems. - **Optimization of embedding fusion**: Traditional embedding fusion methods may not be able to effectively handle different types of embedding vectors (ASV and CM embeddings). Through the parallel network structure, the paper enables each DNN to focus on processing its input features, thereby improving the robustness and accuracy of the system. ### Experimental results - **Development set and progress set**: On the development set and progress set, the parallel network structure proposed in the paper significantly outperforms the traditional single - DNN method, especially performing excellently on the a - DCF metric. - **Evaluation set**: On the evaluation set, the system (S11) using the parallel network structure achieved an a - DCF value of 0.5130, while the system (S14) further optimized by score fusion reached an a - DCF value of 0.4581, showing better performance. In conclusion, through proposing a new parallel network structure and an optimized loss function, this paper effectively solves the security problem of ASV systems when facing spoofing attacks, providing new ideas and technical support for the development of spoof - aware speaker verification systems.