Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

Jungwoo Heo,Ju-ho Kim,Hyun-seo Shin
DOI: https://doi.org/10.21437/Interspeech.2022-602
2022-06-28
Abstract:The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating ASV and spoofing countermeasure (CM) systems. In this paper, we propose two back-end systems: multi-layer perceptron score fusion model (MSFM) and integrated embedding projector (IEP). The MSFM, score fusion back-end system, derived SASV score utilizing ASV and CM scores and embeddings. On the other hand,IEP combines ASV and CM embeddings into SASV embedding and calculates final SASV score based on the cosine similarity. We effectively integrated ASV and CM systems through proposed MSFM and IEP and achieved the SASV equal error rates 0.56%, 1.32% on the official evaluation trials of the SASV 2022 challenge.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the vulnerability of Automatic Speaker Verification (ASV) systems when facing spoofing attacks. Specifically, although methods based on Deep Neural Networks (DNN) have significantly improved the performance of ASV systems in the past decade, these systems are vulnerable to spoofing attacks, which may lead to system failure. Therefore, this paper proposes two back - end systems - the Multi - Layer Perceptron Score Fusion Model (MSFM) and the Integrated Embedding Projector (IEP), aiming to combine ASV and Countermeasure (CM) systems to improve the robustness of the system, especially the ability to perform speaker verification in the context of spoofing attacks. The design goals of these two systems are to improve the performance of the Spoofing - Aware Speaker Verification (SASV) task through effective fusion strategies without modifying or retraining the existing ASV and CM systems. MSFM generates the final SASV score by using the scores and embeddings of ASV and CM; while IEP combines the embeddings of ASV and CM into SASV embeddings and calculates the final SASV score based on cosine similarity. Through these two methods, the author achieved a significant performance improvement in the SASV 2022 challenge, reaching Equal Error Rates (EER) of 0.56% and 1.32%, demonstrating the effectiveness of the proposed methods.