Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization.
Hossain Md. Imran,Islam Md. Shohidul,Khatun Mst. Titasa,Ullah Rizwan,Masood Asim,Ye Zhongfu
DOI: https://doi.org/10.1007/s00034-020-01564-x
IF: 2.311
2020-01-01
Circuits Systems and Signal Processing
Abstract:In this article, we propose a new source separation method in which the dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) algorithms are used sequentially as dual transforms and sparse nonnegative matrix factorization (SNMF) is used to factorize the magnitude spectrum. STFT-based source separation faces issues related to time and frequency resolution because it cannot exactly determine which frequencies exist at what time. Discrete wavelet transform (DWT)-based source separation faces a time-variation-related problem (i.e., a small shift in the time-domain signal causes significant variation in the energy of the wavelet coefficients). To address these issues, we utilize the DTCWT, which comprises two-level trees with different sets of filters and provides additional information for analysis and approximate shift invariance; these properties enable the perfect reconstruction of the time-domain signal. Thus, the time-domain signal is transformed into a set of subband signals in which low- and high-frequency components are isolated. Next, each subband is passed through the STFT and a complex spectrogram is constructed. Then, SNMF is applied to decompose the magnitude part into a weighted linear combination of the trained basis vectors for both sources. Finally, the estimated signals can be obtained through a subband binary ratio mask by applying the inverse STFT (ISTFT) and the inverse DTCWT (IDTCWT). The proposed method is examined on speech separation tasks utilizing the GRID audiovisual and TIMIT corpora. The experimental findings indicate that the proposed approach outperforms the existing methods.