Single-Channel Speech Separation Based on Non-negative Matrix Factorization and Factorial Conditional Random Field
Li Xu,Tu Ming,Wang Xiaofei,Wu Chao,Fu Qiang,Yan Yonghong
DOI: https://doi.org/10.1049/cje.2018.06.016
IF: 1.019
2018-01-01
Chinese Journal of Electronics
Abstract:A new Non-negative matrix factorization(NMF) based algorithm is proposed for single-channel speech separation with a prior known speakers, which aims to better model the spectral structure and temporal continuity of speech signal. First, NMF and k-means clustering are employed to obtain multiple small dictionaries as well as a state sequence that describes the temporal dynamics between these dictionaries for each speaker.Then, a Factorial conditional random field(FCRF) model is trained using the state sequences and dictionaries to jointly model the temporal continuity of two speakers’ mixed signal for separation. Experiments show that the proposed algorithm outperforms the baselines with respect to all metrics, for example sparse NMF(+1.12 dB SDR, +2.37 dB SIR, +0.40 dB SAR, +0.2 MOS), nonnegative factorial hidden Markov model(+2.04 dB SDR,+4.26 dB SIR, +0.62 dB SAR, +1.0 MOS) and standard NMF(+2.8 dB SDR, +5.08 dB SIR, +1.06 dB SAR, +1.2 MOS).