Multi-Channel Feature Adaptation for Robust Speech Recognition

Zhaofeng Zhang,Xiong Xiao,Longbiao Wang,Jianwu Dang,Masahiro Iwahashi,Eng Siong Chng,Haizhou Li
DOI: https://doi.org/10.1109/iscslp.2016.7918435
2016-01-01
Abstract:In this paper, we propose a feature adaptation method that combines speech features from multiple microphone channels for robust automatic speech recognition (ASR). The proposed method first transforms the features in all channels using channel-dependent linear transforms, and then sum the channels into one channel for acoustic modeling. The transform parameters are estimated by maximizing the likelihood of the transformed features on a Gaussian mixture model (GMM) trained from clean features. To use diagonal covariance matrices for efficient estimation algorithm, the likelihood function is evaluated in the cepstral domain, while the transformation is in the log Mel filterbank domain. We evaluate the proposed feature adaptation on the 6-channel evaluation data in the CHiME-3 task. Results show that the proposed feature adaptation method with diagonal channel-dependent transforms reduces word error rate (WER) from 21.05% (best single channel) to 16.96% when a DNN-based acoustic model is used. This result is also slightly better than the 17.60% obtained by the minimum variance distortionless response beamforming.
What problem does this paper attempt to address?