Deep Recurrent Neural Networks With Nonlinear Masking Layers And Two-Level Estimation For Speech Separation

Jiantao Zhang,Pingjian Zhang
DOI: https://doi.org/10.1007/978-3-030-30490-4_32
2019-01-01
Abstract:Over the past few decades, monaural speech separation has always been an interesting but challenging problem. The goal of speech separation is to separate a specific target speech from some background interferences and it has been treated as a signal processing problem traditionally. In recent years, with the rapid advances of deep learning techniques, deep learning has made a great breakthrough in speech separation. In this paper, recurrent neural networks (RNNs) which integrate multiple nonlinear masking layers (NMLs) to learn two-level estimation are proposed for speech separation. Experimental results show that our proposed model "RNN SMMs + 3 NMLs" outperforms the baseline RNN without any mask in all the SDR, SIR and SAR indices, and it also obtains much better SDR and SIR than the RNN simply with original deterministic time-frequency masks.
What problem does this paper attempt to address?