Abstract:Automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. We propose a spoofing-robust ASV system optimized directly for the recently introduced architecture-agnostic detection cost function (a-DCF), which allows targeting a desired trade-off between the contradicting aims of user convenience and robustness to spoofing. We combine a-DCF and binary cross-entropy (BCE) with a novel straightforward threshold optimization technique. Our results with an embedding fusion system on ASVspoof2019 data demonstrate relative improvement of $13\%$ over a system trained using BCE only (from minimum a-DCF of $0.1445$ to $0.1254$). Using an alternative non-linear score fusion approach provides relative improvement of $43\%$ (from minimum a-DCF of $0.0508$ to $0.0289$).

What problem does this paper attempt to address?

This paper attempts to solve the problem that Automatic Speaker Verification (ASV) systems are vulnerable to spoofing attacks. Specifically, the paper proposes an anti - spoofing speaker verification system optimized directly for the architecture - agnostic Detection Cost Function (a - DCF). By combining a - DCF and Binary Cross - Entropy (BCE) loss and introducing a novel threshold optimization technique, this research aims to improve the robustness and performance of the speaker verification system, thereby achieving a better balance between user convenience and security. ### Main Problems and Solutions 1. **Problem Description**: - **Vulnerability to Spoofing Attacks**: Existing ASV systems are easily exploited by spoofing attacks (such as replay attacks, text - to - speech synthesis, etc.). - **Limitations of Evaluation Metrics**: The traditional t - DCF (tandem Detection Cost Function) is only applicable to tandem architectures and cannot be widely applied to other types of systems. 2. **Solutions**: - **Introduction of a - DCF**: a - DCF is a new detection cost function that can evaluate anti - spoofing speaker verification systems of different architectures with only one set of detection scores and one detection threshold. - **Softening a - DCF**: Since a - DCF is based on hard error counting and is not differentiable, the paper proposes to "soften" a - DCF into a differentiable form so that it can be optimized using the gradient descent method. - **Joint Optimization of Model Parameters and Thresholds**: Through Algorithm 1, the neural network weights and the detection threshold are simultaneously optimized to minimize the a - DCF loss. ### Experimental Results The paper shows through experiments the performance improvement of the optimized system on the ASVspoof2019 dataset. Compared with the baseline system using only BCE loss, the system combining a - DCF and BCE loss has a significant improvement in the minimum a - DCF value. In addition, through further optimization of the threshold, the system performance is further improved. ### Key Formulas - **a - DCF Formula**: \[ a\text{-}DCF(\tau_{sasv}) = C_{tar}^{miss} \cdot \pi_{tar} \cdot P_{tar}^{miss}(\tau_{sasv}) + C_{non}^{fa} \cdot \pi_{non} \cdot P_{non}^{fa}(\tau_{sasv}) + C_{spf}^{fa} \cdot \pi_{spf} \cdot P_{spf}^{fa}(\tau_{sasv}) \] where: - $ C_{tar}^{miss} $ and $ C_{non}^{fa} $ are the costs of target miss and non - target false alarm respectively; - $ \pi_{tar} $, $ \pi_{non} $, and $ \pi_{spf} $ are the prior probabilities of target, non - target, and spoofing attack respectively; - $ P_{tar}^{miss} $, $ P_{non}^{fa} $, and $ P_{spf}^{fa} $ are the target miss rate, non - target false alarm rate, and spoofing false alarm rate respectively; - $ \tau_{sasv} $ is the detection threshold. - **Softened Error Rate**: \[ \hat{P}_{tar}^{miss}(\tau_{sasv}) = \frac{1}{N_{tar}} \sum_{x \in tar} \sigma(\tau_{sasv} - g(x)) \] \[ \hat{P}_{non}^{fa}(\tau_{sasv}) = \frac{1}{N_{non}} \sum_{x \in non} \sigma(g(x) - \

Optimizing a-DCF for Spoofing-Robust Speaker Verification

a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification

Enhancing Out-of-Domain Detection for Speech Spoofing Countermeasure Via Supervised Contrastive Learning

Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

Speaker-Aware Anti-Spoofing

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion.

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

Spoofing-Aware Speaker Verification by Multi-Level Fusion

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Voice Presentation Attack Detection Using Convolutional Neural Networks

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Can spoofing countermeasure and speaker verification systems be jointly optimised?

An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning

Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System

How to Boost Anti-Spoofing with X-Vectors.

Audio Anti-spoofing Using a Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning