Exploring data augmentation in bias mitigation against non-native-accented speech

Yuanyuan Zhang,Aaricia Herygers,Tanvina Patel,Zhengjun Yue,Odette Scharenborg
2023-12-24
Abstract:Automatic speech recognition (ASR) should serve every speaker, not only the majority ``standard'' speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a ``non-standard'' or ``diverse'' way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system. Since this is a low-resource problem, we investigate the optimal type of data augmentation, i.e., speed/pitch perturbation, cross-lingual voice conversion-based methods, and SpecAugment, applied to both native Flemish and non-native-accented Flemish, for bias mitigation. The results showed that specific types of data augmentation applied to both native and non-native-accented speech improve non-native-accented ASR while applying data augmentation to the non-native-accented speech is more conducive to bias reduction. Combining both gave the largest bias reduction for human-machine interaction (HMI) as well as read-type speech.
Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the bias against non - native - accented speakers in Automatic Speech Recognition (ASR) systems. Specifically, the research aims to reduce the bias against Flemish speakers with non - standard or diverse accents through data augmentation techniques, especially in low - resource environments. The study explores the effects of different types of data augmentation methods (such as speed/pitch perturbation, cross - language voice conversion methods, and SpecAugment) in reducing this bias. These methods are applied to standard Flemish and Flemish with non - native accents respectively. The research results show that the application of specific types of data augmentation to both types of speech can improve the ASR performance of non - native accents, and the application of data augmentation to non - native - accented speech is more effective in reducing bias. When the two are combined, the bias in Human - Machine Interaction (HMI) and read - aloud speech can be significantly reduced.