A Study of Learning Based Beamforming Methods for Speech Recognition

Xiong Xiao,Chenglin Xu,Zhaofeng Zhang,Shengkui Zhao,Sining Sun,Shinji Watanabe,Longbiao Wang,Lei Xie,Douglas L. Jones,Eng Siong Chng,Haizhou Li
2016-01-01
Abstract:This paper presents a comparative study of three learning based beamforming methods that are specifically designed for robust speech recognition. The three methods are 1) neural network that predicts beamforming weights from generalized cross correlation (GCC) features; 2) neural network that predicts timefrequency (TF) mask which is used to estimate MVDR (minimum variance distortionless response) beamforming weights; 3) maximum likelihood estimation of beamforming weights to fit enhanced features to clean trained Gaussian mixture model. All three methods operate in frequency domain. They are evaluated on the CHiME-4 benchmarking speech recognition task and compared with traditional delay-and-sum and MVDR beamforming methods on the same speech recognition task. Discussions and future research directions are presented.
What problem does this paper attempt to address?