Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge.

Feng Ma,Li Chai,Jun Du,Diyuan Liu,Zhongfu Ye,Chin-Hui Lee
DOI: https://doi.org/10.21437/interspeech.2019-2601
2019-01-01
Abstract:CHiME-5 is a research community challenge targeting the problem of far-field and multi-talker conversational speech recognition in dinner party scenarios involving background noises, reverberations and overlapping speech. In this study, we present five different kinds of robust acoustic models which take advantages from both effective data augmentation and ensemble methods to improve the recognition performance for the CHiME-5 challenge. First, we detail the effective data augmentation for far-field scenarios, especially the far-field data simulation. Different from the conventional data simulation methods, we use a signal processing method originally developed for channel identification to estimate the room impulse responses and then simulate the far-field data. Second, we introduce the five different kinds of robust acoustic models. Finally, the effectiveness of our acoustic model ensembling strategies at the lattice level and the state posterior level are evaluated and demonstrated. Our system achieves the best performance of all four tasks among submitted systems in the CHiME-5 challenge.
What problem does this paper attempt to address?