Adaptive Speaker Normalization for CTC-Based Speech Recognition

Penguin Ding,Wu Guo,Bin Gu,Zhenhua Ling,Jun Du
DOI: https://doi.org/10.21437/interspeech.2020-1390
2020-01-01
Abstract:In this paper, we propose a new speaker normalization technique for acoustic model adaptation in connectionist temporal classification (CTC)-based automatic speech recognition. In the proposed method, for the inputs of a hidden layer, the mean and variance of each activation are first estimated at the speaker level. Then, we normalize each speaker representation independently by making them follow a standard normal distribution. Furthermore, we propose using an auxiliary network to dynamically generate the scaling and shifting parameters of speaker normalization, and an attention mechanism is introduced to improve performance. The experiments are conducted on the public Chinese dataset AISHELL-1. Our proposed methods present high effectiveness in adapting the CTC model, achieving up to 17.5% character error rate improvement over the speaker-independent (SI) model.
What problem does this paper attempt to address?