Single-channel Speech Separation with Non-Negative Matrix Factorization and Factorial Conditional Random Fields

LI Xu,TU Ming,WU Chao,GUO Yanmeng,NA Yueyue,FU Qiang,YAN Yonghong
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2017.21.016
2017-01-01
Abstract:Non-negative matrix factorization (NMF) has been extensively used for single channel speech separation. However, a typical issue with the standard NMF based methods is that they assume the independency of each time frame of the speech signal and, thus, cannot model the temporal continuity of the speech signal. This paper presents an algorithm for single-channel speech separation based on NMF and the factorial conditional random field (FCRF) method. A model is developed by combining NMF with the k-means clustering method. This model can concurrently describe the spectral structure and the temporal continuity of the speech signal. Then, the model is used to train the FCRF model, which is used to separate the mixed speech signal. Tests show that this algorithm consistently improves the separation performance compared with the active-set Newton algorithm, an NMF based approach that dose not consider the temporal dynamics of the speech signal.
What problem does this paper attempt to address?