An Experimental Study on Joint Modeling of Mixed-Bandwidth Data Via Deep Neural Networks for Robust Speech Recognition.

Jianqing Gao,Jun Du,Changqing Kong,Huaifang Lu,Enhong Chen,Chin-Hui Lee
DOI: https://doi.org/10.1109/ijcnn.2016.7727253
2016-01-01
Abstract:We propose joint modeling strategies leveraging upon large-scale mixed-band training speech for recognition of both narrowband and wideband data based on deep neural networks (DNNs). We utilize conventional down-sampling and up-sampling schemes to go between narrowband and wideband data. We also explore DNN-based speech bandwidth expansion (BWE) to map some acoustic features from narrowband to wideband speech. By arranging narrowband and wideband features at the input or the output level of BWE-DNN, and combining down-sampling and up-sampling data, different DNNs can be established. Our experiments on a Mandarin speech recognition task show that the hybrid DNNs for joint modeling of mixed-band speech yield significant performance gains over both the narrowband and wideband speech models, well-trained separately, with a relative character error rate reduction of 7.9% and 3.9% on narrowband and wideband data, respectively. Furthermore, the proposed strategies also consistently outperform other conventional DNN-based methods.
What problem does this paper attempt to address?