Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition.

Fenglin Ding,Wu Guo,Lirong Dai,Jun Du
DOI: https://doi.org/10.1109/icassp40776.2020.9053967
2019-01-01
Abstract:In this paper, we propose a novel adaptive technique that uses an attention-based gated scaling (AGS) scheme to improve deep feature learning for connectionist temporal classification (CTC) acoustic modeling. In AGS, the outputs of each hidden layer of the main network are scaled by an auxiliary gate matrix extracted from the lower layer by using attention mechanisms. Furthermore, the auxiliary AGS layer and the main network are jointly trained without requiring second-pass model training or additional speaker information, such as speaker code. On the Mandarin AISHELL-1 datasets, the proposed AGS yields a 7.94% character error rate (CER). To the best of our knowledge, this result is the best recognition accuracy achieved on this dataset by using an end-to-end framework.
What problem does this paper attempt to address?