Adversarial Data Augmentation for Robust Speaker Verification

Zhenyu Zhou,Junhui Chen,Namin Wang,Lantian Li,Dong Wang

2024-02-05

Abstract:Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a potential issue with the vanilla DA is augmentation residual, i.e., unwanted distortion caused by different types of augmentation. To address this problem, this paper proposes a novel approach called adversarial data augmentation (A-DA) which combines DA with adversarial learning. Specifically, it involves an additional augmentation classifier to categorize various augmentation types used in data augmentation. This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations. Experiments conducted on VoxCeleb and CN-Celeb datasets demonstrate that our proposed A-DA outperforms standard DA in both augmentation matched and mismatched test conditions, showcasing its superior robustness and generalization against acoustic variations.

Sound,Machine Learning,Audio and Speech Processing

What problem does this paper attempt to address?

This paper proposes a new method called Adversarial Data Augmentation (A-DA) to address the "augmentation residue" issue in traditional Data Augmentation (DA) for speaker verification. In the speaker verification task, the aim is to verify the claimed identity of speech segments. While DA enriches the training data by simulating acoustic variations in real-life, enhancing the model's ability to ignore irrelevant acoustic changes, it may lead to unwanted distortions caused by different types of augmentation, namely augmentation residue. To address this issue, the paper combines DA with Adversarial Learning, introducing an additional augmentation classifier to identify the different types used in DA. This adversarial learning enables the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned embeddings more robust to augmentation variations. Experiments are conducted on the VoxCeleb and CN-Celeb datasets, and the results show that A-DA outperforms standard DA under matching and non-matching augmentation test conditions, demonstrating its superior robustness and generalization ability to acoustic variations. Therefore, the A-DA method aims to improve the robustness of deep speaker models under complex acoustic conditions.

Adversarial Data Augmentation for Robust Speaker Verification

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

Feature Augmentation for Adversarial Robustness

Shift to Your Device: Data Augmentation for Device-Independent Speaker Verification Anti-Spoofing

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Visualizing data augmentation in deep speaker recognition

Improving Speech Emotion Recognition With Adversarial Data Augmentation Network

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning.

Diffusion-Based Adversarial Purification for Speaker Verification

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

AugLy: Data Augmentations for Robustness

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Adversarial Speaker Verification.

Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

Targeted Augmented Data for Audio Deepfake Detection

Data Augmentation Can Improve Robustness

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation