Towards single integrated spoofing-aware speaker verification embeddings

Sung Hwan Mun,Hye-jin Shim,Hemlata Tak,Xin Wang,Xuechen Liu,Md Sahidullah,Myeonghun Jeong,Min Hyun Han,Massimiliano Todisco,Kong Aik Lee,Junichi Yamagishi,Nicholas Evans,Tomi Kinnunen,Nam Soo Kim,Jee-weon Jung

2023-06-01

Abstract:This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge. We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data and distinct nature of ASV and CM tasks. To this end, we propose a novel framework that includes multi-stage training and a combination of loss functions. Copy synthesis, combined with several vocoders, is also exploited to address the lack of spoofed data. Experimental results show dramatic improvements, achieving a SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.

Audio and Speech Processing,Artificial Intelligence,Sound

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop a single integrated anti - spoofing speaker verification (SASV) embedding model, which can meet the requirements of two aspects simultaneously: 1. **Recognize non - target speaker inputs and spoofed inputs of target speakers**: That is, the model needs to be able to effectively reject the voice inputs of non - target speakers, and at the same time, it can also recognize the spoofed voice inputs provided by the target speakers. 2. **Competitive performance compared with fusion methods**: Currently, the method of fusing automatic speaker verification (ASV) and counter - measure (CM) embeddings is significantly superior to single - embedding solutions in performance. The goal of this paper is to make the single - integrated SASV embedding model comparable to these fusion methods in performance. To achieve the above goals, the authors analyzed the reasons for the poor performance of the single SASV embedding model, mainly due to the insufficient amount of training data and the differences in the nature of ASV tasks and CM tasks. For this reason, they proposed a new framework, including the use of multi - stage training and loss function combinations. In addition, the replication synthesis technology combined with multiple vocoders was also utilized to deal with the lack of spoofed data. The experimental results show that this method significantly improves the performance and reaches an SASV - EER (equal error rate) of 1.06% in the evaluation protocol of the SASV2022 challenge.

Towards single integrated spoofing-aware speaker verification embeddings

Spoofing-Aware Speaker Verification by Multi-Level Fusion

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion.

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

SASV Challenge 2022: A Spoofing Aware Speaker Verification Challenge Evaluation Plan

SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches

Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification

The Vicomtech Spoofing-Aware Biometric System for the SASV Challenge

Backend Ensemble for Speaker Verification and Spoofing Countermeasure

On the potential of jointly-optimised solutions to spoofing attack detection and automatic speaker verification

SASV Based on Pre-trained ASV System and Integrated Scoring Module

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Integrated Replay Spoofing-Aware Text-Independent Speaker Verification

Can spoofing countermeasure and speaker verification systems be jointly optimised?