Abstract:Speech enhancement (SE) by transient noise suppression refers to the process of estimating the desired signal from a transient noise (TN)-corrupted speech signal. It has important applications in speaker verification and identification, voice-based biometric systems, hearing aids, video conferences, and many others. Enhancement of TN-corrupted signals is both important and challenging due to their high randomness, high short-time energy, and wide frequency domain distribution. This paper presents a novel approach to SE that leverages the power of sequential sparse nonnegative matrix factorization (SNMF), enhanced dictionary learning (DL), and Gini index (GI)-based fusion. We use a multi-step approach to enhance noisy speech. Our approach uses semi-supervised learning since the TN used for dictionary training is derived from the noisy signal using optimally-modified log-spectral amplitude (OMLSA) and is then purified. We involve both the dictionaries of the noisy signal and external clean speech in the DL process to merge the strengths of both dictionaries. Firstly, the dictionary of the noisy signal is obtained through SNMF and is then decomposed into speech-dominant and noise-dominant submatrices using semi-supervised learning. Then this speech-dominant submatrix is combined with the clean speech dictionary to construct an improved speech dictionary. This enhanced dictionary and the external clean speech dictionary are then used for the SE in the testing phase, resulting in two different initial estimated signals. To improve the accuracy of the initial estimates, we apply the GI to obtain the final estimate. Furthermore, the noisy signal's phase is also enhanced. A significant increase is observed in the results through different evaluation measures.

Transductive Nonnegative Matrix Factorization for Semi-Supervised High-Performance Speech Separation

Deep NMF for speech separation

Deep Learning Based Speech Separation Via NMF-Style Reconstructions.

Non-negative Tensor Factorization for Speech Enhancement

Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation

Advanced transient noise reduction in speech signals via semi-supervised signal fusion

Robust discriminative non-negative matrix factorization.

Blind Spectral Unmixing in Terahertz Domain Using Nonnegative Matrix Factorization

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Semi-supervised non-negative matrix tri-factorization with adaptive neighbors and block-diagonal learning

Underdetermined convolutive blind source separation algorithm based on nonnegative matrix factorization

Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

Towards Solving The Bottleneck Of Pitch-Based Singing Voice Separation

Nonnegative Discriminant Matrix Factorization

Individualized Conditioning and Negative Distances for Speaker Separation

Co-Separable Nonnegative Matrix Factorization

Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Separation of Moving Sound Sources Using Multichannel NMF and Acoustic Tracking

Heterogeneous Convolutive Non-Negative Sparse Coding

Correntropy Supervised Non-Negative Matrix Factorization

Non-negative Matrix-Set Factorization