Exploring data augmentation in bias mitigation against non-native-accented speech

Yuanyuan Zhang,Aaricia Herygers,Tanvina Patel,Zhengjun Yue,Odette Scharenborg

2023-12-24

Abstract:Automatic speech recognition (ASR) should serve every speaker, not only the majority ``standard'' speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a ``non-standard'' or ``diverse'' way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system. Since this is a low-resource problem, we investigate the optimal type of data augmentation, i.e., speed/pitch perturbation, cross-lingual voice conversion-based methods, and SpecAugment, applied to both native Flemish and non-native-accented Flemish, for bias mitigation. The results showed that specific types of data augmentation applied to both native and non-native-accented speech improve non-native-accented ASR while applying data augmentation to the non-native-accented speech is more conducive to bias reduction. Combining both gave the largest bias reduction for human-machine interaction (HMI) as well as read-type speech.

Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the bias against non - native - accented speakers in Automatic Speech Recognition (ASR) systems. Specifically, the research aims to reduce the bias against Flemish speakers with non - standard or diverse accents through data augmentation techniques, especially in low - resource environments. The study explores the effects of different types of data augmentation methods (such as speed/pitch perturbation, cross - language voice conversion methods, and SpecAugment) in reducing this bias. These methods are applied to standard Flemish and Flemish with non - native accents respectively. The research results show that the application of specific types of data augmentation to both types of speech can improve the ASR performance of non - native accents, and the application of data augmentation to non - native - accented speech is more effective in reducing bias. When the two are combined, the bias in Human - Machine Interaction (HMI) and read - aloud speech can be significantly reduced.

Exploring data augmentation in bias mitigation against non-native-accented speech

Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End Speech Recognition Systems

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

Accent Recognition with Hybrid Phonetic Features

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Quantifying Bias in Automatic Speech Recognition

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR

Improving child speech recognition with augmented child-like speech

Data Augmentation for End-to-end Code-switching Speech Recognition

Data Augmentation for Diverse Voice Conversion in Noisy Environments

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Creating Spoken Dialog Systems in Ultra-Low Resourced Settings

Data augmentation using prosody and false starts to recognize non-native children's speech

Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation