Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge

Li Chai,Jun Du,Di-Yuan Liu,Yan-Hui Tu,Chin-Hui Lee
DOI: https://doi.org/10.1109/slt48900.2021.9383628
2021-01-01
Abstract:This paper presents our main contributions of acoustic modeling for multi-array multi-talker speech recognition in the CHiME-6 Challenge, exploring different strategies for acoustic data augmentation and neural network architectures. First, enhanced data from our front-end network preprocessing and spectral augmentation are investigated to be effective for improving speech recognition performance. Second, several neural network architectures are explored by different combinations of deep residual network (ResNet), factorized time delay neural network (TDNNF) and residual bidirectional long short-term memory (RBiLSTM). Finally, multiple acoustic models can be combined via minimum Bayes risk fusion. Compared with the official baseline acoustic model, the proposed solution can achieve a relatively word error rate reduction of 19% for the best single ASR system on the evaluation data, which is also one of main contributions to our top system for the Track 1 tasks of the CHiME-6 Challenge.
What problem does this paper attempt to address?