An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework

Jun Du,Qing Wang,Yan-Hui Tu,Xiao Bao,Li-Rong Dai,Chin-Hui Lee
DOI: https://doi.org/10.1109/ASRU.2015.7404827
2015-01-01
Abstract:We present an information fusion approach to robust recognition of microphone array speech for the recently launched 3rd CHiME Challenge. It is based on a deep learning framework with a large neural network consisting of subnets with different architectures. Multiple knowledge sources are integrated via an early fusion of normalized noisy features with different beamforming techniques, speech enhanced features, speaker related features, and other auxiliary features concatenated as the input to each subnet, and a late fusion by combining the outputs of all subnets to produce one single output set. Our experiments demonstrate that all information sources are complementary in our proposed framework. Our best system achieves an average word error rate reduction of 68% from the officially released baseline results on the test set of real data.
What problem does this paper attempt to address?