Abstract:Margin-based losses, especially one-class classification loss, have improved the generalization capabilities of countermeasure systems (CMs), but their reliability is not tested with spoofing attacks degraded with channel variation. Our experiments aim to tackle this in two ways: first, by investigating the impact of various codec simulations and their corresponding parameters, namely bit-rate, discontinuous transmission (DTX), and loss, on the performance of the one-class classification-based CM system; second, by testing the efficacy of the various settings of margin-based losses for training and evaluating our CM system on codec simulated data. Multi-conditional training (MCT) along with various data-feeding and custom mini-batching strategies were also explored to handle the added variability in the new data setting and to find an optimal setting to carry out the above experiments. Our experimental results reveal that a strict restrain over the embedding space degrades the performance of the one-class classification model. MCT relatively improves performance by 35.55\%, and custom mini-batching captures more generalized features for the new data setting. Whereas varying the codec parameters made a significant impact on the performance of the countermeasure system.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper mainly explores the impact of channel changes on the performance of speech spoof detection systems based on one - class learning. Specifically, the researchers focus on how these changes affect the performance of Countermeasure Systems (CMs) under different codec simulation conditions. The following are the main problems that the paper attempts to solve: 1. **The impact of channel changes on the performance of CM systems**: - The researchers evaluate the impact of these parameters on the performance of CM systems by simulating different codec parameters (such as bit rate, discontinuous transmission (DTX), packet loss rate, etc.). - In particular, they test the performance of these parameters under different settings to understand which parameters have a significant impact on system performance. 2. **The effects of multi - condition training (MCT) and custom mini - batching strategies**: - The researchers explore multi - condition training (MCT) and different data input and custom mini - batching strategies to deal with the new variability introduced by channel changes and find the optimal training settings. - For example, they test the impact of random mini - batches and customized mini - batching strategies (such as each batch containing the same number of spoof and real samples, or each batch containing samples from the same speaker or the same codec simulation) on the generalization ability of the model. 3. **The effectiveness of different loss functions**: - The researchers compare the performance of multiple margin - based loss functions (such as Softmax, AM - Softmax, OC - Softmax) when dealing with codec - simulated data. - They pay particular attention to the impact of strictly restricting the embedding space on real samples and explore the impact of different loss function settings on model performance. 4. **The generalization ability of the model in real - world scenarios**: - The researchers use the ASVspoof 2021 evaluation set to test the generalization ability of the model in new environments to verify its reliability in real - world applications. Through these experiments, the researchers hope to find an anti - spoofing detection method that can still maintain good performance in the presence of channel changes, thereby improving the security of Automatic Speaker Verification (ASV) systems. ### Formula summary The formulas involved in the paper include: - **Softmax Loss**: \[ L_S=\frac{1}{N}\sum_{i = 1}^{N}\log\left(1 + e^{(\mathbf{w}_1 - y_i-\mathbf{w}_{y_i})^T\mathbf{x}_i}\right) \] where \(N\) is the number of samples in a mini - batch, \(\mathbf{x}_i\in\mathbb{R}^D\) and \(y_i\in\{0, 1\}\) are the embedding and label respectively, and \(\mathbf{w}_0,\mathbf{w}_1\in\mathbb{R}^D\) are the weight vectors of the two classes. - **AM - Softmax Loss**: \[ L_{AMS}=\frac{1}{N}\sum_{i = 1}^{N}\log\left(1 + e^{\alpha\left(m - (\hat{\mathbf{w}}_{y_i}-\hat{\mathbf{w}}_{1 - y_i})^T\hat{\mathbf{x}}_i\right)}\right) \] - **OC - Softmax Loss**: \[ L_{OCS}=\frac{1}{N}\sum_{i = 1}^{N}\log\left(1 + e^{\alpha(m_{y_i}-\hat{\mathbf{w}}_0\hat{\mathbf{w}}_{y_i})}\right) \]

Impact of Channel Variation on One-Class Learning for Spoof Detection

Enhancing Out-of-Domain Detection for Speech Spoofing Countermeasure Via Supervised Contrastive Learning

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

One-Class Neural Network With Directed Statistics Pooling for Spoofing Speech Detection

Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora

Audio Anti-spoofing Using a Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection

Spoofing attack augmentation: can differently-trained attack models improve generalisation?

Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Phase perturbation improves channel robustness for speech spoofing countermeasures

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

Channel Effects on Surrogate Models of Adversarial Attacks against Wireless Signal Classifiers

Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms

Synthetic speech detection using meta-learning with prototypical loss

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

One-class Learning Towards Synthetic Voice Spoofing Detection