Beamforming and Lightweight GRU Neural Networkcombination Model for Multi-Channel Speech Enhancement

Zhengdong Cao,Dongmei Li
DOI: https://doi.org/10.1007/s11760-024-03263-5
IF: 1.583
2024-01-01
Signal Image and Video Processing
Abstract:In this paper, a multi-channel speech enhancement structure combining beamformers and lightweight neural networks is proposed. By using a first-order differential beamformer to improve the signal-to-noise ratio of the target speech, the burden on the subsequent neural network is reduced, thus lowering the complexity of the required neural network. A two-stage gated recurrent neural network is employed, where the first stage recurrent neural network processes the channel characteristics of the speech and the second handles the frequency domain features of the speech. The frequency band division combined with convolution is used to further compress the network scale. With suitable loss function, the speech enhancement performance of the model is further improved. The proposed model structure is trained, evaluated, and validated using simulated multi-channel microphone array datasets generated from the public TIMIT dataset. The results demonstrate that our model achieves good multi-channel speech enhancement performance with relatively small parameter and computational requirements when compared to popular existing approaches.
What problem does this paper attempt to address?