Deep Learning Based Speech Separation Via NMF-Style Reconstructions.

Shuai Nie,Shan Liang,Wenju Liu,Xueliang Zhang,Jianhua Tao
DOI: https://doi.org/10.1109/taslp.2018.2851151
2018-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:Deep learning based speech separation usually uses a supervised algorithm to learn a mapping function from noisy features to separation targets. These separation targets, either ideal masks or magnitude spectrograms, have prominent spectro-temporal structures. Nonnegative matrix factorization (NMF) is a well-known representation learning technique that is capable of capturing the basic spectral structures. Therefore, the combination of deep learning and NMF as an organic whole is a smart strategy. However, previous methods typically use deep neural networks (DNN) and NMF for speech separation in a separate manner. In this paper, we propose a jointly combinatorial scheme to concentrate the strengths of both DNN and NMF for speech separation. NMF is used to learn the basis spectra that then are integrated into a DNN to directly reconstruct the magnitude spectrograms of speech and noise. Instead of predicting activation coefficients inferred by NMF, which is used as an intermediate target by the previous methods, DNN directly optimizes an actual separation objective in our system, so that the accumulated errors could be alleviated. Moreover, we explore a discriminative training objective with sparsity constraints to suppress noise and preserve more speech components further. Systematic experiments show that the proposed models are competitive with the previous methods.
What problem does this paper attempt to address?