Abstract:A major difficulty in speech spoofing detection lies in improving the generalization ability to detect unknown forgery methods. However, most previous methods do not consider the interference of silence information on the generalization performance of speech spoofing detection. Notably, we experimentally observe that the generalization performance of existing methods drops sharply when silence segments are trimmed. This indicates that previous works have two problems: a) they do not remove the interference of silence and over-rely on silence information, and b) they lack the ability to uncover general forgery traces in utterance segments. To solve the above two problems, we propose a novel Silence-Agnostic Speech Spoofing Detection (SASSD) framework. To be specific, unlike previous methods trained on speech samples with silence information, we completely remove the leading and trailing silence segments from all speech samples to eliminate the interference of silence and focus on utterance information. Meanwhile, to uncover general forgery traces in utterance segments and improve the generalization ability, we view speech spoofing detection as a domain generalization problem and employ meta-learning to simulate the actual domain shift scenarios, which can reduce overfitting to specific forgery methods. In addition, to improve the domain generalization of metalearning, a novel data augmentation method named ShuffleMix is proposed. Unlike previous methods that only consider interspeech patterns, our method additionally introduces an intraspeech augmentation technique, which performs enhancements within a single speech and across multiple speech to generate more diverse forged samples. Extensive experiments show that our method achieves SOTA on the ASVspoof 2019LA dataset. In particular, our method achieves 0.231% EER and 2.529% EER on the original dataset with silence information and the silence-trimmed dataset, respectively.

A Light CNN with Split Batch Normalization for Spoofed Speech Detection Using Data Augmentation

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Voice Presentation Attack Detection Using Convolutional Neural Networks

Enhancing Out-of-Domain Detection for Speech Spoofing Countermeasure Via Supervised Contrastive Learning

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

Improved Lightcnn with Attention Modules for Asv Spoofing Detection

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection

ConvNeXt Based Neural Network for Audio Anti-Spoofing

Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders

Adversarial Voice Conversion Against Neural Spoofing Detectors.

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

How to Boost Anti-Spoofing with X-Vectors.

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion

One-Class Neural Network With Directed Statistics Pooling for Spoofing Speech Detection

STATNet: Spectral and Temporal features based Multi-Task Network for Audio Spoofing Detection

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

Generalizable Speech Spoofing Detection Against Silence Trimming with Data Augmentation and Multi-task Meta-Learning