Improved BLSTM RNN Based Accent Speech Recognition Using Multi-task Learning and Accent Embeddings

Wenbi Rao,Ji Zhang,Jianwei Wu
DOI: https://doi.org/10.1145/3388818.3389159
2020-03-20
Abstract:A major challenge in Automatic speech recognition (ASR) systems for Mandarin is to be able to handle speakers with different kinds of accents. ASR systems that are trained using single-task learning underperform due to poor generalization ability when confronted with a new accent. In this paper, we explore how to use accent sentences information that accent embeddings and multi-task learning on the basis of the bidirectional long short term memory (BLSTM) to improve accent speech recognition. Firstly we consider augmenting the speech input with accent information in the form of embeddings extracted by a standalone network. Then we propose multi-task learning architecture that we jointly learn an accent classifier and a multi-accent acoustic model. Experiments with these methods demonstrate that we obtain a 4% average relative improvement in word error rate over a multi-accent baseline system.
What problem does this paper attempt to address?