Abstract:Speech enhancement (SE) by transient noise suppression refers to the process of estimating the desired signal from a transient noise (TN)-corrupted speech signal. It has important applications in speaker verification and identification, voice-based biometric systems, hearing aids, video conferences, and many others. Enhancement of TN-corrupted signals is both important and challenging due to their high randomness, high short-time energy, and wide frequency domain distribution. This paper presents a novel approach to SE that leverages the power of sequential sparse nonnegative matrix factorization (SNMF), enhanced dictionary learning (DL), and Gini index (GI)-based fusion. We use a multi-step approach to enhance noisy speech. Our approach uses semi-supervised learning since the TN used for dictionary training is derived from the noisy signal using optimally-modified log-spectral amplitude (OMLSA) and is then purified. We involve both the dictionaries of the noisy signal and external clean speech in the DL process to merge the strengths of both dictionaries. Firstly, the dictionary of the noisy signal is obtained through SNMF and is then decomposed into speech-dominant and noise-dominant submatrices using semi-supervised learning. Then this speech-dominant submatrix is combined with the clean speech dictionary to construct an improved speech dictionary. This enhanced dictionary and the external clean speech dictionary are then used for the SE in the testing phase, resulting in two different initial estimated signals. To improve the accuracy of the initial estimates, we apply the GI to obtain the final estimate. Furthermore, the noisy signal's phase is also enhanced. A significant increase is observed in the results through different evaluation measures.

Non-negative Tensor Factorization for Speech Enhancement

Forensic Speech Enhancement Based on Two-Dimensional Fractional Fourier Transform Domain

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Transductive Nonnegative Matrix Factorization for Semi-Supervised High-Performance Speech Separation

Advanced transient noise reduction in speech signals via semi-supervised signal fusion

Speech Enhancement for Non-Stationary Noise Environments

Speech Enhancement Using Non-Negative Spectrogram Models With Mel-Generalized Cepstral Regularization

Enhancement Algorithm for Low Signal to Noise Ratio Speech

Sparse Nonnegative Matrix Factorization Strategy for Cochlear Implants

The Separation of Vibration Components Based on Sparse Nonnegative Tensor Factorization

An Improved Speech Enhancement Algorithm Based on Wavelet Transform

Deep Factorization for Speech Signal

A Speech Enhancement Algorithm Based on Computational Auditory Scene Analysis

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Deep NMF for speech separation

Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis

Improved Speech Enhancement Algorithm Based on Short-Time Spectral Analysis

Dynamic noise aware training for speech enhancement based on deep neural networks.

Peripheral Nonlinear Time Spectrum Features Algorithm for Large Vocabulary Mandarin Automatic Speech Recognition

Speech Enhancement Algorithm Based on Spectral Subtraction