Abstract:SpecAugment is a very effective data augmentation method for both HMM and E2E-based automatic speech recognition (ASR) systems. Especially, it also works in low-resource scenarios. However, SpecAugment masks the spectrum of time or the frequency domain in a fixed augmentation policy, which may bring relatively less data diversity to the low-resource ASR. In this paper, we propose a policy-based SpecAugment (Policy-SpecAugment) method to alleviate the above problem. The idea is to use the augmentation-select policy and the augmentation-parameter changing policy to solve the fixed way. These policies are learned based on the loss of validation set, which is applied to the corresponding augmentation policies. It aims to encourage the model to learn more diverse data, which the model relatively requires. In experiments, we evaluate the effectiveness of our approach in low-resource scenarios, i.e., the 100 hours librispeech task. According to the results and analysis, we can see that the above issue can be obviously alleviated using our proposal. In addition, the experimental results show that, compared with the state-of-the-art SpecAugment, the proposed Policy-SpecAugment has a relative WER reduction of more than 10% on the Test/Dev-clean set, more than 5% on the Test/Dev-other set, and an absolute WER reduction of more than 1% on all test sets.

Investigation of specaugment for deep speaker embedding learning

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition

Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction

Speaker Augmentation for Low Resource Speech Recognition

Speaker Embedding Augmentation with Noise Distribution Matching

Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification

Adversarial Data Augmentation for Robust Speaker Verification

Speech Augmentation via Speaker-Specific Noise in Unseen Environment

A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR

Shift to Your Device: Data Augmentation for Device-Independent Speaker Verification Anti-Spoofing

Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

Towards Robust Speaker Verification with Target Speaker Enhancement

On Using SpecAugment for End-to-End Speech Translation

Data augmentation enhanced speaker enrollment for text-dependent speaker verification

A Multi-task Framework of Speaker Recognition with TTS Data Augmentation

A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit

Data Augmentation for End-to-end Code-switching Speech Recognition